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PREFACE 


Tr study of a problem by statistical methods usually involves three 


stages: (1) the collection of material or data; (2) the mathematical analy- 
sis of the data thus collected; (3) the interpretation of results, for the 
particular purpose in view. 

As to stage (1), the best methods of collecting data depend almost 
entirely on the nature of the particular field of inquiry, and are not dis- 
cussed in this Handbook. The same is true in regard to stage (3); the 
problems connected with the interpretation of statistical results are 
necessarily very different in different fields of inquiry, and are not dis- 
cussed in this Handbook,’ except as illustrations of the mathematical 
methods involved. 

The problems of stage (2), on the other hand, are in a sense common to 
all fields of statistical inquiry. Whatever the content of the data may be, 
the form of the mathematical analysis is essentially the same. It is with 
these formal problems of mathematical analysis that this Handbook 
deals. Illustrations are taken from this or that particular field, for the 
sake of concreteness; but the general applicability of the methods to all 
fields is constantly borne in mind, and the terminology throughout the 
Handbook is kept as non-special as possible. 

Special emphasis is laid on the limitations surrounding the proper 
application of the various methods of analysis. Without careful atten- 
tion to these limitations, the results of a statistical inquiry may be alto- 
gether misleading. 

Each chapter has been critically read by at least two other contributors 
besides the author; but the final responsibility for all the chapters rests 
with the individual authors. 

The National Research Council contributed to the preparation of the 
Handbook by the grant of funds for traveling expenses incident to 
meetings of the Committee and for a small amount of clerical assist- 
ance. The royalties from the book are received by the National 
Research Council to be made available, if needed, for further work 
in the field of mathematical statistics. d 
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CHAPTER I 


MATHEMATICAL MEMORANDA 
By E. V. HUNTINGTON 


NUMERICAL COMPUTATION 


Slide rules, tables, and computing machines. Before undertaking 
any statistical work one should supply one’s self with suitable aids to 
computation. 

For three-figure accuracy, a ten-inch slide rule is very convenient. 
The larger Fuller or Thacher slide rules give four, or sometimes five, 
significant figures. Barlow’s Tables of squares, square roots, cube 
roots, and reciprocals, are almost indispensable. The multiplication 
tables are also often convenient. Crelle’s Table gives the product of 
every three-figure number by every three-figure number. Peters’s Table 
gives the product of every four-figure number by every two-figure num- 
ber. The smaller table of H. Zimmermann gives the product of every 
three-figure number by every two-figure number. Tables of logarithms 
of numbers, and for certain purposes tables of trigonometric functions, 
are invaluable. Four- and five-place tables exist in great variety. If 
more than five figures are required, use Bremiker’s six-place table or pro- 
ceed at once to a seven-place table: for example, Vega. For eight 
places use the two-volume table of Bauschinger and Peters. Explana- 
tions of the use of tables of logarithms usually accompany the tables 
themselves; see, for example, E. V. Huntington’s Handbook of Mathe- 
matics for Engineers. 

In extended work some form of computing machine will soon pay for 
itself in spite of the apparently large initial expense. The best-known 
adding and listing machines are the Burroughs and the Wales, with 
standard keyboards, and the Dalton and the Sundstrand with ten-key 
keyboards. (The wide-paper form of carriage is more convenient for 
most purposes than the narrow-ribbon type.) Among the calculating 
machines may be mentioned the Comptometer, the Burroughs non- 
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listing machine, the Monroe calculator, the Millionaire, the Brunsviga, 
the Ensign, the Mercedes-Euclid, and the Marchant. Some of these 
can be operated by electricity. 

For elaborate classification of large amounts of statistical data, as 
in the work of the Census Bureau, the Hollerith or the Powers machine 
for sorting punched cards is practically indispensable. 

In advanced work in statistical theory, Pearson’s Tables for Statisti- 
cians and Biometricians are invaluable. 

The new Tables for Applied Mathematics, by J. W. Glover, include in 

one volume a large number of tables for finance, insurance, and statistics, 
together with a seven-place table of logarithms. 
. Absolute and relative errors. The numerical data in a statistical 
computation are usually the result of measurement, observation, or 
estimate, and hence are only approximately correct. The closeness of 
the approximation may be measured either by the absolute error or by 
the relative error. 

The absolute error is sometimes defined as the observed value minus 
the true value (x; — X) and sometimes as the true value minus the 
observed value (X — 2). When the distinction of sign is important, 
the error 7; — X may be called the deviation of the observed value from 
the true value (a positive deviation being an “ error in excess,” and a 
negative deviation an ‘ error in defect ”’), while the error X — x; may 
be called the correction to be applied to the observed quantity (the 
correction being positive or negative according as the observed quantity 
needs to be increased or decreased). 

The relative error is the absolute error divided by the true value. 

For example, suppose 2, = 3.06 cm. and x2 = 2.97 cm. are two ap- 
proximate values and X = 3.00 cm. is the true value. Then the ab- 
solute error of x; is 0.06 cm. (deviation = + 0.06 cm., correction = 
— 0.06 cm.) while the relative error is 0.02, or2 percent. Similarly, the 
absolute error of x2 is 0.03 em. (deviation = — 0.03 cm., correction 
= + 0.03 cm.) while the relative error is 0.01, or 1 per cent. 

The absolute error is connected with the number of decimal places, 
and is important when the quantity is to be added or subtracted, or com- 
pared with other quantities on an absolute basis. For example, a 
measurement may be “ correct to two decimal places’; an estimated 
population may be “ correct to the nearest million,” etc. 

The relative error, on the other hand, is connected with the number 
of significant figures, and is important when the quantity is to be multi- 
plied or divided, or compared with another quantity on a percentage 
basis. For example, a number may be said to be “ correct to four signifi- 
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cant figures,” “correct to within 3 per cent of the value,” “ correct 
within one part in 6000,” ete. 

In any statistical investigation, either the desired number of decimal 
places, or more usually, the desired number of significant figures should 
be decided upon in advance, and borne constantly in mind throughout 
the work. 

Propagation of error in computation. The manner in which small 
errors in the data may accumulate in the course of a computation is in- 
dicated as follows: 

(1) In addition: Suppose, for example, that each of 20 numbers has 
a possible error of half a unit in the third decimal place; then the sum 
of these numbers may have a possible error of 10 units in the third 
decimal place — that is, an error of 1 unit in the second decimal place. 
All figures beyond the second decimal place should therefore be dis- 
carded in the answer. In general, one doubtful figure in any column 
will render that whole column doubtful; hence all figures to the right of 
that column should be discarded in the answer. 

(2) In subtraction: Two numbers may each be correct, say, to five 
significant figures, and yet their difference may be correct to only one or 
two significant figures; for example, 3.1416 — 3.1402 = 0.0014. Neg- 
lect of this fact is a frequent source of overconfidence in regard to the 
precision of a result. 

(3) In multiplication and division: The number of significant figures 
which can be relied on in a product or quotient is never greater than the 
number of reliable significant figures in the weakest factor. 

The relative error of a product or quotient may be as great as the sum 
of the relative errors of the separate items. 

(4) In powers and roots: The relative error in the nth power of a 
number is n times the relative error in the number itself. Similarly, 
the relative error in ¥/z is only 1/nth of the relative error in z itself. 

(5) In exponents and logarithms: If y = e’, or x = logy, then an 
absolute error of say .01 in x corresponds approximately to a relative 
error of .01 in y. 

(6) In arithmetic or geometric mean: The relative error in the 
arithmetic or geometric mean of a number of quantities will be approxi- 
mately the same as the relative error of the individual items (greater 
than the least of these relative errors and less than the greatest of them). 

Rejection of superfluous figures. It isa fundamental rule of computa- 
tion that a result should never be stated to a greater degree of precision 
than is justified by the data. All superfluous digits are misleading and 
should be rejected from the result. 


4 HANDBOOK OF MATHEMATICAL STATISTICS 


If the first rejected figure is 5 or more, the preceding figure should be 
increased by one; otherwise, it should be left unchanged.! 

For example, 3.14159 reduced to four figures is 3.142. Again, 6.1297 
reduced to four figures is 6.130. Note that in a decimal fraction a 
final zero is as significant as any other final digit in determining the 
degree of precision. But in the case of a whole number like 3140000 the 
final zeros leave the reader in doubt whether the number of reliable 
significant figures is 3, 4, 5,6, or 7. This ambiguity can be removed by 
writing the number in the form 3140000, or 3140000, or 3140000, etc., 
as the case may require; or, more usually, in the form 3.14 X 10°, or 
3.140 X 10°, or 3.1400 X 105, etc., as the case may require. 

This latter ‘notation by powers of 10” should always be used in the 
case of very large or very small numbers. For example, 


0.000003140 = 3.140 X 10~*. 


(Note: In this notation, the exponent of the power of 10 is the same 
as the “ characteristic ”’ of the logarithm of the number.) 


DEFINITIONS OF VARIOUS KINDS OF MEANS OR AVERAGES 


(1) The arithmetic mean (AM) of n numbers, 21, 22, +++ Xn, is 1/nth 
of their sum: 


AM =", ++ +++ +); or AM = 2 32, 


The AM is what is ordinarily meant when the term “ mean” or 
‘average ”’ is used without further qualification. It is related to the 
center of gravity (or centroid) in mechanics, the center of gravity of 
a set of n equal particles being a point whose distance from any fixed 
plane is the AM of the distances of the several particles from that plane. 
It is also related to the ‘“‘ method of least squares,” since the sum of the 
squares of the deviations of the m numbers from any value X is a mini- 
mum when X is the AM of the numbers. 

In computing an AM note that adding any constant, + h, to all the 
numbers has the effect of adding + h to their AM. 

For two numbers, a and b, the AM = 4(a + B). 

(2) The geometric mean (7M) of n (positive) numbers, 21, %2,°** Xn 
is the nth root of their product: 


n 
GM =Varaen sen 


1 A refinement of this rule is sometimes to be recommended, namely: if the re- 
jected figures are exactly 5000 - - -, the preceding figure should be raised when it is 
odd and left unchanged when it is even. 


bd 
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In computing the GM of n numbers, it is usually convenient to use the 
formula : 


log GM) = : (log a1 + log 22+ +++ + log 2,), orlog GM) = 1 s(ogz,), 
n 


that is, take the AM of the logarithms of the numbers, and then take 
the anti-log of the result. 

For two numbers, a and b, the GM is = Vab. This is called also 
the mean proportional between a and 8, since 
a:x =2:b. By drawing a semicircle on a+b 
as diameter, the value of x can be constructed 


geometrically, as in Figure 1. 


(3) The harmonic mean (HM) of n (positive) nies 
numbers, £1, %2,°*** Ln, is the reciprocal of the arithmetic mean of the 
reciprocals of the numbers: , 

1 
M= fe, 
ak he a aE Dy yaa es % 
NN\X1 rea 


The chief use of the HM is in averaging frequencies, 1/zx being called a 
frequency when z is a duration. 

For example, in steamship statistics, the average number of trips 
per year may be more significant than the average number of days spent 
on each trip. 


For two numbers, a and b, HM = 2 0b 


a+b 

The AM,GM, and HM are the so-called classical means, known to the 
Greeks. 

(4) The contra-harmonic mean (CHM) is almost as old, but is of 
very slight importance eae : 

ty? + my? +e + Le +4 2 
CHM = 5, PEELS or CHM = 2X(x?7)/Z(). 

(5) The root-mean-square or S) of m numbers, 71, 22,°** Zp, is 

the square root of the arithmetic mean of their squares: 


RMS = * (a? + a? + +++ + 2,2),0r RMS = V2 2(2?). 


The RMS is related to the radius of gyration in mechanics, the radius 
of gyration of a set of n equal particles, with respect to a given axis, 
being the RMS of the radial distances of the several particles from that 
axis. In statistics, the RMS of the deviations of a set of numbers from 
their arithmetic mean is called the standard deviation (SD) of those 


numbers. 
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The standard deviation of a set of numbers is also equal to the RM. Ss 
of the differences between the numbers taken two by two; thus, if x is 
the AM of the numbers, then 


SD = V3 3-2) Pi Ie Wego x3)", 


where n(n — 1) /2 = the number of the nearer: in question. 
For any positive numbers, 71 < 22 S «+ S 4n, the order of magnitude 
of these five means is as follows (unless the numbers are all equal) : 


ai< HM <GM < AM < RMS < CHM < xy. 


For the special case of two numbers, a and b, the following facts may 
be noted : 
The GM of two numbers is the GM between their HM and their AM. 
The AM of two numbers is the AM between their HM and their 
CHM. 
The RMS of two numbers is the GM between their AM and their 
CHM. 
The following general formulas, due to Mr. R. M. Foster, may also 
be noted : 
M = ([(x + ak +e + a,t)/n]* 
uM’ = gyktt + goktl to... fg eH 
ay > ag +++ baal 
Ifk=—o-—-1 0 1 PEG) 
thnM= a HM GM AM RMS z, 
and M’= «2 HM AM CHM Tn 


(The proof involves the evaluation of certain simple indeterminate 
forms.) 

(6) The median of a set of quantities is, roughly speaking, the middle 
one of the set, when they are arranged in order of magnitude (i.e. “ ar- 
rayed’’’). If the number of quantities is even, and the two middle 
quantities are not equal, the median is commonly taken as the number 
halfway between them. More exactly, the median, in this case, is a 
number X uniquely determined by the equation 


(X — ay)(X — ay) +++ (X — ax) = (Gey1 — X)(Guye — X) +++ (Gn — X), 


where 1, @2,+++ a, are the quantities of the lower half, and ax41, ax42, 
‘+ @n, the quantities of the upper half of the set. (D. Jackson, Bull. 
Amer. Math. Soc., Jan. 1921.) 
The sum of the absolute deviations of n numbers from any value X 
is 2 minimum when X is the median of those numbers. 
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(7) The mode of a sét of quantities is that quantity which occurs 
most often (i.e. is the most fashionable), if such a quantity exists. Any 
quantity which occurs more often than any other quantity near it in 
size may be called a relative mode (or simply a mode) of the set. 

A general mathematical formula including the arithmetic mean, the 
median, and the mode is due to D. Jackson and R. M. Foster: Let X be 
the value of x which minimizes 2|z; — z|”. Then if p = 2, X = the 
arithmetic mean; if p->1, lim X = the median; if p>0, limX = the 
mode. We note also that if po, limX = 4(z, + 2,), where 2; is 
the smallest and x, the largest of the given quantities. 

In the case of the median and the mode (as in the case of the AM), 
adding a constant, + A, to all the numbers has the effect of adding the 
same constant to the mean. (This is not true in case of the other four 
types of means.) ; 

The following general properties are often useful: ' 

In the case of any one of the seven means, multiplying all the numbers 
by a constant factor, c, has the effect of multiplying the mean by the 
same constant, c. (‘‘ Change of scale.’’) 

In computing the AM, GM, HM, or RMS of n numbers, it is allow- 
able, after grouping the numbers in any way, to replace each number 
of any group by the corresponding mean of that group. (This is not al- 
lowable in the case of the CHM, the median, or the mode.) 

Weighted means. If the given numbers 21, %2,+--+ 2, have different 
degrees of importance, as indicated by “ weights ”’ wi, we, +++ Wn, then 
we may speak of the weighted mean of these numbers (of any one of the 
seven kinds). Any kind of weighted mean of the given set of n num- 
bers is defined as the corresponding kind of simple mean of a set of W 
numbers, in which 2x; occurs w, times, 22 occurs w2 times, etc., and 
W =w,i+we2+-::++ wy, is the sum of the weights. 


For example, the weighted arithmetic mean is (wyt1 + Wate +o*> 


} 


+ watn); the weighted geometric mean is (112+ ++ 2n%)”"; ete. 
The term “ weighted mean,” or ‘‘ weighted average,’ used without 
qualifying adjective, usually indicates the weighted arithmetic mean. 


PERMUTATIONS AND COMBINATIONS. THE BINOMIAL 
THEOREM 


‘ Permutations. The number of possible permutations or arrange- 
ments of n different elements is “7 factorial” = n!=1-2-+3---n. 
Another notation is rn = n! 


8 HANDBOOK OF MATHEMATICAL STATISTICS 


Thus, the three letters a, b, c admit 3! = 6 permutations: abc, acb, 
bac, bca, cab, cba. 

If among the n elements there are p equal ones of one sort, g equal 
ones of another sort, r equal ones of a third sort, etc., where p + g +r + 
-++ =n, then the number of possible permutations is 


(nt) /(@)@)@!) +--+] 


Thus, the four letters a, b, b, b, admit 24/[(1)(6)] = 4 permutations: 
abbb, babb, bbab, bbba. 


Combinations. The number of possible combinations or groups of 
n elements taken r at a time (without repetition of any element within 
n! 
(n—r)!r! 

in the binomial expansion of (1+ 2)". (Notice that .C, = »Cn_,). 

Thus, the five letters abcde taken two at a time give ;C, = 10 combina- 
tions: ab, ac, ad, ae, bc, bd, be, cd, ce, de. 

If repetitions are allowed within each group, then the number of 
combinations of n elements taken r at a time is »4,-1C;. 

Thus, five letters taken two at a time, repetitions allowed, give ¢Cz =15 
combinations: aa, ab, ac, ad, ae, bb, be, bd, be, cc, cd, ce, dd, de, ee. 

The general principle underlying the theory of permutations and 
combinations is this: If we can do one thing in m ways and another 
thing in n ways, then we can do both things together in mn ways. 


any one group) is »C, = = the coefficient of the term in 2” 


The binomial theorem. If n is any positive integer, 


(p+q)"=p" + npg + eS ge ea wee 


pp. nC rp edict xOz Dar arts a Osis et ses 
where .Ci=n, nC2=[n(n—1)]/(2!),  aCs=[n(n—1)(n—2)]/(3)), «°- 
nC, =[n(n — 1)(n — 2) +++ (n— 7 + 1)]/(r!) 
are the binomial coefficients. 


; Note that RAs = RO: = eens a 
(n—r)!r! 


pag + eee a q* 


Other notations are ,C, = (*) = (n);. 
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TABLE OF BINOMIAL COEFFICIENTS 


3 


—_—_——| |_—————q— | — 


i 
§- COONOOARWNH 


ray 
~“- CODNOOKRWNEH 


ee 


Note that each number, plus the number on its left, gives the number 
next below. 


STIRLING’S FORMULA. THE BERNOULLI NUMBERS 


Stirling’s formula for n factorial. The following formula gives a good 
approximation to n! for large values of n: 


n! = (V27n)(n")(e-”), or, more accurately, : 
6 
n! = (W2mn)(n)(e-)(e#"), where 0 < 6 < 1, 
whence 
logs (n!) = (n+ 4) logs n ~ n+ loge (V3x) +2, 


or logis (n!) = (n + 4) logy n — (.434204482 n) + 39909 + ne. 

The last term, in which 0 < 6 < 1, indicates the degree of approxima- 
tion attained. For example, if nm = 1000, logi) (1000!) = 2567.6046, so 
that 1000! = 4.024 « 1087, 

A seven-place table of logis (n!) up to nm = 1000 is given in Pearson’s 
Tables, page 98, and in Glover’s Tables, page 482. 

A still more accurate approximation is 
log. (n!) = (n + $) loge n — n+ log. (V27r) 
Bip Bes We Be le Bel eBy de 
12 ne 63 <4 1) 526.98 75S 9-10.78 
where B; = 3, Bs = xy, Bs = dy, +++ are the Bernoulli numbers (see 
below), and0 < @< 1. 
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Wallis’s formula for 7. Wallis’s formula (useful in connection with 
the proof of Stirling’s formula) is an infinite product the limit of which 
is 7/2: 

™ 2A ee 
2 2n-—-12n+1 


The Bernoulli numbers. The following numbers occur in the expan- 
sion of many functions, such as tan 2, sec x, x/(e” — 1), etc. 


I ete eae 
1gSe5 Sur 


B, = 1/6 Be = 

B: = 1/30 By = 

B, = 1/30 Bg = 1385 

B, = 5/66 By = 50521 

By = 691/2730 By = 2702765 

By; = 7/6 By, = 199360981 

By; = 3617/510 Big = 19391512145 

Bi; = 43867/798 Big = 2404879675441 

By = 174611/330 By = 370371188237525 
etc. etc. 


The numbers Be, Bi, Be, - - - are sometimes denoted by Ee, Hu, Es, ++ 
or by F,, Ex, E3,+++ ; while the numbers By, B;, B;, ++» are sometimes 
denoted by Bo, Bs, Bs,:++ or by Bi, Bo, B3,--- 

For recursion formulas, see B. O. Peirce, Table of Integrals. For an 
extended table, see Glover’s Tables. For large values of n, the following 
approximations are useful : 


ens 1 
(2n)! ene yan rere sot onal 
Bo, _ 22042 1 1 1 


(2n)! ainti | 5 Spatial pat aayear coats! 


THE GAMMA FUNCTION 


_ The Gamma Function of any positive number 7 is defined by 


T(n) = {E xo te7"dx. 
0 


If nis a positive integer, (n+ 1)=n!. (See Stirling’s formula, above.) 
In general, [(n + 1) = nT(n), so that the value of I'(n) for any posi- 
tive n can be found, by successive reductions, from a table covering the 
range from any integer to the succeeding integer, as, for example, from 
n=1ton=2. In particular, ee 


T(0) =o, T(z) = Vz, T(1) =1, T(2) =1, T(3) =2. 
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The graph of the function is shown in Figure 2. The minimum point 
is given by I'(1.4616321) = .8856032. Tables of the 
Gamma Function are given in Pearson’s Tables and in 
Glover’s Tables. 

The Beta Function. The Beta Function of any two 
positive numbers, m and n, is defined by 


1 
Bim, n) = | 2 (1 — x)" "dz 
0 


bs 

= 2 f sin?™-19 - cost—1 - dg = Tim) T(n), 

0 : I'(m + n) 

The hypergeometric series. The hypergeometric series is a function 

of x involving three parameters, a, b, c: 

F(a,b,¢,2) =1 a-b afa + 1) b(b+1),. 
er are Gee oe 348 eel) © 

a(a + 1)(a+ 2) bb +1)(6+2) 31 .,, 

pa 1-2-3 aoc yan 


=a Dic) ; — paid c—b— eee —a 
BET aia omay pein aa ee a 


GAUSS’S NORMAL ERROR CURVE, OR PROBABILITY CURVE 


Constants of the normal curve. The most important constants con- 
nected with the normal curve of error are the following (see Figure 3) : 

Yo = Maximum ordinate (where x = 0), or height at the “ mode.” 

A = total area, fromz = —o tor = +o. If the curve is given 
by a finite number of equi-spaced ordinates, then, approxi- 
mately, A = N-Az, where N = total length of the ordinates 
(total population”), and Az = distance between the ordi- 
nates (‘ class interval’’). 

o = “standard deviation” or ‘“ root-mean-square error’’ = abscissa 
of point of inflection, given by 


ou ante 
= Vi f wydx, or, approximately, « = 5 (8%). 


p = “probable error ’” = value of the abscissa such that the area 
from z = — ptox = + :is half the totalarea A. Here p = 
(pV 2) ¢ =0.674489749 o, where p = 0.476936276 - - - is a num- 
ber defined by the equation , 


2 Pp 
As if edt = 4. 
- Tv 0 
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Less important constants are: 


1/h = “modulus” = oV2. Here h = 1/(cV2) is called the “meas- 
ure of precision.” 


» = “mean absolute error” = ¢V2/r = 0.797884561 o. 
Here y = z xyde, or, approximately, 7 = asta -y). 
A Jo N 


Note that o, p, 1/h, and y are quantities of the same kind as 2, their 
order of magnitude being indicated in Figure 3. 


Relations between the constants are shown in the following table 
(in which p = 0.47694, as defined above) : 


Sips a A 
Vix P Ve © 
= 0.269085 = 0.56419 Ah «= 0.318314 
7 
A = Va = Vx Yo = TYon 
p h 
= 3.71633 yp| = 1.77245 4 = 3.14159 yon 
pv2 oh Vi 
= 1.48260 p = 0.70711 = 1.25331 » 
P| = (pV2)o =p = p(1/h) =(pVx)n 
= 0.67449 « = 0.476045 = 0.845348 y 
| sd dt | 
' =(1/)p = 1h =(Va)n 
= 2.09672 p = 1.77245 
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Equation of the curve. The equation of the normal curve may be 
written in seven ways, differing merely in respect to the choice of con- 
stants [indicated in square brackets] which it is desired to reduce to unity. 
(1) and (2) are in terms of the standard deviation, « ; (3) and (4) in terms 
of the probable error, p; (5) and (6) in terms of the modulus, 1/h. 


Meat 12)" ae nee 
(1) [o, A] A/o Von é (2) [ , Yo] Yo é 
Yo = 0.39894 (A/c) A = 2.50663 (oy) 
—2(2)* y . .-#(2)° 
2 [Rate Sy eet (3) @ rm B= 
Yo = 0.26908 (A/p) A = 3.71633 (pyo) 
@) A) 2=teom (6) Uh, yd) Bm o-te 
Yo = 0.56419 (Ah) A = 1.77245 (yo/h) 
(7) (A, wa Wm 9 “ain 


| 


» 


© RK ao " = oe 
j xo 
go =Standard Deviation » “we “a AND a @ 8 & 

=Probable Error C20 at =o * 
/h2Modulus (9 =.476936276~) ae « 
n =Mean Absolute Error — 
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The ae, on pages 209-16 is based on equation (1), giving —*~—~ (a Te j 


in terms of ~- 
oO 


A table based on equation (2), giving 4 Yj in terms of - ” is found in Yule, 
page 303, 1922 Edition. Yo 

The probability integral, a, is a function of x giving the area under 
the normal curve, from — 2 to + 2. It may be expressed in seven dif- 
ferent forms, corresponding to the seven forms of the equation of the 
curve. The letters in [ ] indicate the quantities which may conveniently 
be put equal to unity. As above, p = 0.47694. : 


t/o 
OIE Home ete | ONG ee fare e-30dt 
2xJ0 
12 z 
S20 ae 
= 2 — t2 pd ~ a 
2-f, e-"dt ~ 23 f e- dt 
2 z/D 2/D 
(3) [p 4] A re va 0 ceed (4) [p, yo] ame = ) e-P'? dt 
sch f age =2 i} ads 
wrv0 PJO0 
2 
5) [h, A] ee et CE e-" dt 6 h ele. a, 2f 2 dt 
( A mn (6) [ Yo) Yo/h 3 e- 
Eg z 
A/y A/ 
(7) [A, yo] ve = 2 f e-7tdt soy ey 2 dt 
0 arvJ0 


The table on pages 209-16 is based on equation (1), eae ; ot in terms of 


la 


Sheppard’s Table, in Pearson’s Tables, gives | S nae aA in terms 


t=, 
ion 


of = Encke’s Table, reproduced in Wright and Hayford, Brunt, etc., 
gives 7 in terms cf zi x Wiese on (3). Oppolzer’s Table, based on (6), 
gives ae in terms of hz. 


Burgess’s Table, reproduced in most of the books, is based on (5), 
giving A in terms of ha. 
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Approximate expressions for the probability integral. 


Let A = total area under the normal curve 
and a=areafrom—zto+z. Then 


a 74 Mee 2 
“= “fed = e-#dt, whereA = (Wir V2 
Mass ir Re ese 2 fo ( muo(;)= ( ™) Yoo 


For small values of x: 
7 = 2 jae (hx a (ha )® (ha)? (ha)° =. OW | 
he 


1!3 2!5 3!7 419 


Ba aie (t/o)’ _ (a/c)! (Ca Peers 
Te (e-3 3° R215 29-317 12-419 | 


ee aoe error less than last term retained.) 


For large values of x: 


Clg) She oe Pee lodapeddsb ilegebe Geely 
A (azy2Va| a 2(he)' 2(ha) (hay) DiChays 
ey ge Nel CLES rae ok bee ere EY Cte 
A (2/o)V2 4 pes (t/o)t  (x/o)% — (a/o)8 
(Series ‘ asymptotic ” or “ semi-convergent ”’; but error is less than 


last term retained.) 
Moments of the area of the normal curve, about the y-axis. The 


nth moment, pp, is defined by pz = f ‘nyde, where ydz is an element of 
area, and x the distance of that element from the y-axis. 

If nis odd, m = ps = ps =°°: = 0. If nis even, we have sp, = Ao?, 
ps = 1-3 Act, pp = 1-3-5 Ao®, wg = 1-3-5-7 Ao’, +--+, where A is the 
total area under the curve, and o is the standard deviation. 

Note that p4/(m2) = 3/A, me/(m2®) = 15/A?, ps/(u2*) = 105/A%, --- 


FUNDAMENTAL FORMULAS IN PROBABILITY 


The most important elementary formulas in probability, in the form 
in which they are usually stated, are here collected for reference. The 
true meaning and scope of these formulas is still a matter of much con- 
troversy.! 

Probability a priori. If n events are regarded as “ equally likely ” 
to happen, as far as we can judge on the basis of a given body of in- 


“ec 


1For a recent critique, with full references, see J. M. Keynes, A Treatise on 
Probability. 
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formation, and if m of these events are “ favorable ’’ while the rest are 
“unfavorable,” then the ratio p = 2 is called the “‘ probability of suc- 


cess,” while g = 1 — p= is the “ probability of failure.” 

For example, consider the throw of a perfect die; as far as we can 
judge, any one of the six forces is as likely to turn up as any other ; hence 
the probability of throwing a 2-spot is §. Again, in throwing two dice, 
there are 36 equally probable results, of which two (namely, 5, 6 and 
6, 5) will yield a total of 11; hence the probability of throwing 11 with 
two dice is ys. 

Addition formula. If e1, e:,-+- are “ mutually exclusive ”’ events, 
and if their separate probabilities are p1, p2, ++, then the probability 
that some one of the events will happen is the sum of the separate 


probabilities: p= p1 + p2 +--+. For’ example, the probability of 
throwing a 1-spot or a 2-spot with one die is § + ¢ = 3. 
Multiplication formula. If e1, ¢2:,--+ are “independent” events, 


with separate probabilities pi, po,-++, then the probability that all of 
the events will happen at once is the product of the separate probabili- 
ties: p = pip2:::. For example, the probability of throwing a double 
2-spot with two dice is (4)(4) = 3. 

Mathematical expectation. If p = the probability of winning an 
amount w, then pw is called the mathematical expectation of the player. 

Bernoulli’s theorem. This theorem consists of two parts. (a) If 
an event is capable of being repeated many times, and the a priori prob- 
ability of success at each trial is always p, then m, the most probable 
number of successes in n trials, will be np when np is an integer; or in 
any case np — q = m = np + p—p (where gq =1— >-). 

(6) Further, the probability that the actual number of successes 
shall differ from the most probable number, pn, by less than a given 
amount c, is approximately 


aA Prk e 
P, = —= | e-“dt + —— 
Vr 0 V2 7mpq 
where g = 1 — pand x = c/V2npgq. 
Probability a posteriori. Suppose an event has occurred as a result 
of one or another of several causes, Ci, C2,+++. Let p; be the probability 
that C; is present (before the occurrence of the event). Let P, be the 


probability that Ci, if present, would produce the event. Then the 
probability that C; was the actual cause is 


(p: Pi)/2(p; P,). 
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QUADRATURE FORMULAS, FOR NUMERICAL INTEGRATION 


b 
The area underacurve. The definite integral, A = f f(x)dz, repre- 


sents the area under the curve y = f(z) from x = a to x = b (that is, 
the area bounded by the curve, the axis of z, and the ordinates corre- 
sponding tox = a and z = b; any area below the axis being taken as 
negative). The problem of quadrature is to compute this area (ap- 
proximately) when a finite number of equally spaced ordinates are 
given. The best-known formulas for this purpose are listed below, 

We suppose the interval from xz = a to x = b is divided into n equal 
parts, of length h, so that h = (b — a)/n. Also let 


Yo =f(2), yr=fath), yo=flat2h),++> 
8 Yn-2 = f(b — 2h), yr = f(b —h), yn =f). 
(1) Trapezoidal rule: 
A =h§b yo t yr + y2 tees tyne + Yn-1 +E Ynh. 


The error involved in using this rule, that is, the correctional 
term which would have to be added to make the rule exact, is 


3 
— [ae | f’'(X), where the accents denote differentiation, and X is 


some unknown value of x between a and b. 
(2) Simpson’s rule (if 7 is a multiple of 2): 
A = thiyot+ (441 +2 y2) + 4ys +2 ys) fees 
eae + (2 Yn—2 + 4 Yn—1) + Ynt. 
5 
Here the correctional term is — Ral f''"(X), where X is some 


unknown value of x between a and 6. 
The special case when n = 2 is called the prismoid formula: 


A = (h/3)§yo +441 + y2}, with error = — (h5/90) f’’’"(X). 

If f(z) is a polynomial of not higher than the third degree, its fourth 
derivative will be zero, and the prismoid formula will be exact. Further, 
if f(z) is a polynomial of the fourth degree, its fourth derivative will be 
constant, and exact results can be obtained by using the prismoid formula, 
with the correctional term. 

(3) The “ three-eighths ” rule (if n is a multiple of 3): 

A =#hiyot (By: t 3 y2t+ 2 ys) + (3 yst 3 ys +2 yo) Here 
atid + (2 Yn—s + 3 Yn—2 + 3 Yn—1) F Yn}. 
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The special case when n = 3 gives 
= Fhiyt3yr+ 3 ys + yst. 
(4) Weddle’s rule (if n is a multiple of 6): 
A= ,hiyot (5 y1 + y2 + 6 ys + ys + 5 ys + 2 yo) 
+ (5yr+tyst 6 yot yot 5 yu +2 yrs) +e°: 
+ (2 yao + 5 Yas + Yn—a + 6 Yn—as + Yn—2 + 5 Yn—1) + Yn}! 
(5) Sheppard’s rule XI (if n is a multiple of 12): 
A= A; + 3(Ay = Ag) a #(Ag = A3) + gis(As a A), where 
Ar =hSytyrtyeters + Yne + Yn—1 + 5 Yn); 
As = 2h(4 yot y2tystere t+ Yrs + Yn—2 + F Yn), 
Az =3h(5 y+ Y¥st Yet: ++ Yn—6 + Yrs + 3 Yn), 
As =4 h(E yo + ys t Ys tees + Yrs t+ Yn—s + F Yn). 
(6) The Euler-Maclaurin formula: 
A=hikyo tut yet ss tuna t tor thy — SEO) - fa 


+ Baby) = gr(a)) ++ + LD Bat gan-0(0) — f(a] + Be. 


(2k)! 
— 1\e+H1 2k+2 
Here R = Bet nf %+2(X), 


where X is some unknown number between a and 0b; and B, = 2, 
Bs; = x5, Bs = gy, °+* are the Bernoulli numbers. 


« 


INFINITE SERIES 


Taylor’s theorem. If a function f(x) has derivatives of all orders at 
a point x = a, then for any value of x sufficiently near the value x = a, 
the function may be expanded into a power series arranged according 
to ascending powers of + — a, as follows: 


sa) = f(a) + ED a — a) +O eo — a) +--+ 4 PO — a) + Rass 


where the remainder, R,4+1, lies between the largest and smallest values 
PeenGhe 
(n+ 1)! 
If the remainder, F,,,1, is small, the first few terms of the series give a 
good approximation to the value of the function. 


— a)"+ for values of € between a and zx. 


i 
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Maclaurin’s theorem is a special case of Taylor’s theorem when a = 0. 
A few important expansions are as follows: 


(tay = 1p ne pM Dg 4 MAO 34. 


te Se 
OU ete ee “(+5 ri —-n<2r<+0 
ae on oa rac in pl 
4 3 42 x! 
log, = nh AOS, 2 oF fa! 
og =) = 2( Se eer) i<a<+l 
Bera ies Dobe —-an<r<+on 
ee to xs a x 
cos = 1 ate 617 81 A<zr< +0 
= 2 alls, ohare tigi us 
tang 2+ S 42 a4 5 ale 
x a _ 2% Bix? ~ 5 fea _ Bw bo 
2 ee ar ie aoe Red 


where B, =.3, Bs = xy, °°° are Fe Bernoulli numbers. 


Fourier’s series. While a Taylor’s series approximates a function 
near a particular point, z = a, a Fourier’s series gives an approxima- 
tion over a whole range. 

If f(z) is finite, and has only a finite number of discontinuities and only 


a finite number of maxima and minima, in an interval from z = —c 
to x = c, then for any value of x between — c and +c 
f(z) = 5 + a cos = + a cos 22 + ay ee we 


+ by sin™ + by sin 27 a eg cos S72 4. 
where a, = : fit) cos ma dt and b, = : f(t) sin ae dt. 


We note that f sin? madz = : and i cos? maxdx = 5? while if k is dif- 
F = 
ferent from m, 


ff'sin ka - sin mz-dz = Oand {" cos kx + cos mz-+ dz = 0. 


CHAPTER II 


FREQUENCY DISTRIBUTIONS 
AVERAGES AND MEASURES OF DISPERSION 
- (ELEMENTARY METHODS) 


By H. L. RIETZ 


INTRODUCTION 


The material of statistics. Items and collections of items may be 
regarded as the material about which the science of statistics is being 
developed. Each item may be simply the record of the presence of a 
certain quality in an individual; for example, it may indicate the color, 
sex, or occupation of an individual. Each item may be the result of 
counting or enumeration; for example, the number of children in a 
family or in a school, the population of a state, the number of births, 
deaths, or accidents. Each item may be a measurement or an estimate: 
for example, height, weight, or annual income of an individual ; monthly 
pig iron production, interest rates, commodity prices, and so on. 

Variables. Inequality among the items is a property of a statistical 
collection. On account of this property, a symbol used to represent the 
magnitudes of the items of a set may be appropriately called a variable. 

We shall find it convenient to recognize two classes of variables: 
discrete and continuous. 

A discrete variable is one whose values differ by assigned steps, often 
by unity ; for example, the number of children in a family, the number 
of rows of kernels on an ear of corn. 

A continuous variable is one whose values may differ by amounts 
which are indefinitely small; for example, the weight of a man, the tem- 
perature at a place. 
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A frequency distribution is an arrangement which shows the fre= 
quencies of the values of a variable in ordered classes. We may exhibit 
frequency distributions as follows: 

1 For early writings on frequency theory, see Ellis, Tran. of Camb. Phil. Soc., 


vols. 8, 9 (1843-44); Venn, Logic of Chance (1866). 
20 
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Example 1. 


Number of rows of kernels on ears of corn . 10]12| 14] 16] 18] 20] 22] 24 
ee eee ee ee isi Laster 


235 


116 | 41 


241 10 


Example 2. 


Weight in pounds of men 
aged 20-24, scien Door: Saeies 110 115 caalveal Bal 73 130 = ~ 145/150 aot 


Frequencies! . AAO Sey Gt 
Example 3. 

Number of ‘“alpha-particles” 

radiated from a disk in one eighth 

ofaminute .. of Bars on A o aaalaaa aoe eelar area ee OSes 
Frequencies? . . . . . . . 57|203|383/525|532/408|273/139/4 E “Of af 


The first of these distributions is clearly with respect to a discrete 
variable with its values equal to even integers only. The second is with 
respect to a continuous variable. The third is with respect to a discrete 
variable with integral values only. 

Class interval and class mark. A class interval is an interval which 
sets bounds to a class of a frequency distribution ; for example, 102.5 to 
107.5, 107.5 to 112.5 are class intervals for weight in the above exam- 
ple 2. Aclass mark is the value which is represented by the mid-value of 
the class interval; for example, 100, 105, 110, . . . are class marks in 
example 2. 

With a discrete variable, the exact values of the variable are usually 
given by class marks. With a continuous variable, all the values that 
fall within a given class interval are for certain purposes grouped at the 
class mark as a convenient approximation. 

The number of values of a variable within a class interval is called a 
class frequency, for example, 1, 16, 109, 241, 235, 116, 41, 10 are the 
class frequencies of example 1. 

Frequency polygon. With a convenient horizontal scale for weight 
in pounds and a vertical scale for class frequencies, we plot in Figure 5 
the class frequencies of example 2 as ordinates at the class marks 100, 
105, 110, . . . shown below the base line. The frequency polygon (Fig. 5) 
is obtained by joining these points by straight lines. Each end point 
is joined to the base at the next class mark to close the polygon. 

Frequency rectangles. Construct rectangles such as ABCD (Fig. 5) 
with class intervals as bases, and with altitudes equal to the ordinates 


1 Medico-Actuarial Investigation, vol. 1 (1912), p. 40. 
2 Rutherford and Geiger, “The Probability Variations in the Distribution of 
Alpha-Particles,” Phil. Mag., series 6, vol. 20 (1910), pp. 698-701. 
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of points such as are plotted in Figure 5. We call these rectangles the 
frequency rectangles. 

The upper boundaries of these frequency tenes together with 
the vertical segments joining the ends of these upper bases, as shown 
in Figure 5, give a graph called the histogram of the distribution. 

Significance of area of frequency rectangles. Let us define unit area 
as that of a rectangle whose base is a class interval and whose altitude 
represents a unit of frequency. Then the area of ABCD (Fig. 5) is 48, 


Class frequencieg 
s 5 


s 


- 
o 


100 105 110 115 120 125 130 185 140 145 150 155 160 165 170 115 
picts of men aged 20-24 and of height 5 ft. 8 in. 
Fia. § PE Sy ee, 
and the total area of all the frequency rectangles is equal to the total 
frequency. 

Frequency curve. A frequency curve may be regarded as an esti- 
mate of the limit that would probably be approached by a frequency 
polygon or histogram (Fig. 5) if the class intervals were made smaller 
and smaller, and the frequency N were at the same time increased 
without limit. The number of observations that will in the long run 
fall between assigned values a and b is proportional to the shaded area. 
Thus, if y = f(x) is the frequency Ses 


fi ude 
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is the most probable frequency of observations in the interval ¢ 
to b. 

Sometimes the whole area under a frequency curve is taken as the 
unit of area. Then 


4-00 
[vee =1 


and an area represents a relative 
frequency or statistical proba- 


bility. 
ty) 
i ydx 


Thus, 
gives the probability that an ob- 
servation taken at random from a set will fall into the interval a to b. 

A cumulative frequency distribution. Start at one end of a frequency 
distribution such as that in example 2 by recording the end class fre- 
quency, then form the sum of two class frequencies nearest the end, 
then of the three nearest the end, and so on. When the end value and 
these sums are arranged in the order in which they are obtained, the 
distribution is called a cumulative frequency distribution. For ex- 
ample, by thus adding frequencies in example 2, page 21, we obtain 


Fia. 6 


Example 4. 
Weight in pounds below 102.5 | 107.5 | 112.5 | 117.5 | 122.5 | 127.5 | 132.5 | 137.5 
Cumulative frequencies . 3 | 


136 


32 


80 


9 222 | 300 | 345 


142.5 | 147.5 | 152.5 | 157.5 | 162.5 | 167.5 


142.5 | 147.5 | 152.5 | 157.5 | 162.5 172.5 
375 | 197 | 408 | 411 | 414 | 415 | 416 


as a cumulative frequency distribution. It shows the number of in- 
dividuals whose weight does not exceed an assigned value. 
Ogive. The graph of a cumulative frequency distribution on codrdi- 


nate paper is called an ogive. 


APPLICATION OF AVERAGES TO FREQUENCY DISTRIBUTIONS 


In the description of a frequency distribution, we usually make use of 
certain averages already defined in Chapter I. In this connection, 
the averages may take on further meanings in the characterization of 
the distribution. 

Graphical meaning of the arithmetic mean. In the graphical repre- 
sentation of a frequency distribution (Figs. 5 and 6), the arithmetic 
mean (AM) of all the values of a variable is the abscissa of the centroid 
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of the total area under the frequency curve. From a sample set con- 
sisting of N values of the variable, we actually compute the abscissa 
of the centroid of the area of the frequency rectangles. é 
Computation of the arithmetic mean X when a frequency distribution 
is given. 
Let fi, fe,°** fn be class frequencies, 
X1, Xo, +++ Xn corresponding class marks, 
and N=hfi + fe- oe Ja: 
By definition, the arithmetic mean 
xy uafK aXe tw (1) 
A Ba Bee PaO WH, Sf 
=k, 4 A(& — Xo) + fol Xo — Xo) +22 + fel Xn — Xo) 
N 


= Xo + 5 (XK — X), (2) 


where Xp is an arbitrary number. 
The formula (2), with Xp» chosen as the class mark near the mean, is 
usually more convenient than (1) for numerical computation, when N 


is large. 
The arithmetic mean is then 
x = Xo + b, (3) 
where b = 1(X1 — Xo) + fo(X2 — Xo) fees + fn(Xn — Xo) 
N 


al it 
aon 2f(X — Xo) (4) 


is the mean value of the deviations X — Xp, and may be regarded as 
the correction to be applied to the guess X> to get the mean. A form 
for the computation of X by this method is shown in example 5. 
Weighted arithmetic mean. The weighted arithmetic mean of Xj, 
Xo,°*+* X, with weights Wi, We,--- W, is defined as 
xy a WX + Weds +s: + WaXn, (5) 
EB Wis Wear eee Ws 


It is obvious from (1) and (5) that X in (1) may be regarded as the 
arithmetic mean of class marks weighted with corresponding class fre- 
quencies. 

Form for the calculation of the arithmetic mean for a given frequency 
distribution. Let X = class mark, f = frequency, X) =a selected 
class mark near the mean. Deviations X — Xp are in units of a class 
interval. 
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Example 6. Finding the AM from the frequency distribution of example 2. 


(1) (2) (3) (4) 
x fj Xam Kar fe Xo) 
170 1 9 9 
165 1 8 8 
160 3 a 21 
155 3 6 18 
150 11 5 55 
145 22 4 88 
140 30 3 90 
135 45 2 90 
130 78 1 78 
125 86 0 0 457 
120 56 -1 — 56 
115 48 —2 — 96 
110 23 —3 — 69 
105 6. —4 — 24 
100 3 —5 — 15 
416 — 260 
+ 197 
b=+ Ae = 0.474 class interval = 2.37 pounds. 


X = 125 + 2.37 = 127.37 pounds. 


A good check on the accuracy of the computation consists in recal- 
culating the mean after changing the origin of deviations by one class 
interval. 

Geometric mean.! In terms of the symbols defined above, the 
geometric mean (7M) is 


GM = VX Xo ++ Xai 


or log gu = flog X H flog Xe fof felon Xe (6) 


In the right-hand member of (6), we have simply the arithmetic 
mean of the logarithms, log X1, log Xe,-+-, log Xn, in place of the 
arithmetic mean of the numbers givenin (1). Thus, the geometric mean 
is simply the antilogarithm of the arithmetic mean of the logarithms of 
the numbers. The calculation of the geometric mean is thus reduced 
to the calculation of the arithmetic mean of the logarithms. 

Harmonic mean.? Since the harmonic mean (p. 5) of a set of num- 


1 For uses of geometrical mean, see G. Udny Yule, Introduction to Theory of Sia- 
tistics (1911), pp. 125-28. 

2 For the meaning of a certain useful weighted arithmetic mean and of a corre- 
sponding weighted harmonic mean to be used in measuring changes in the general 
price level, see Allyn A. Young, “The Measurement of Changes in the General Price 
Level,’”’ The Quarterly Journal of Economics, vol. 35 (1921), pp. 557-73. 
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bers is the reciprocal of the arithmetic mean of the reciprocals of the 
numbers, the calculation of the harmonic mean is at once reduced to 
that of the calculating of an arithmetic mean. 

‘Graphical meaning of the median. In the graphical representation 
of a frequency distribution (Figs. 5 and 6), the median of a variable is 
the abscissa of points the vertical through which divides the total area 
under the frequency curve into two equal parts. What we actually — 
compute for the median of a sample set of N values of the variable is 
the abscissa of a point the vertical through which divides the total area 
of the frequency rectangles into two equal parts. 

Computation of the median when the frequency distribution is given. 
The calculation of the median when the frequency distribution is given 
usually involves interpolation by proportional parts. We shall find as 
an illustration the median of weight of men for which the distribution 
is given in example 2. The median weight of the 416 men is a weight 
such that 208 are below this value and 208 are above it. 

From the cumulative frequencies of example 4, page 23, we note that 
the median falls into the class interval 122.5 to 127.5. Below 122.5 
there are 136 cases. Below 127.5 there are 222 cases. 

Hence, of the class frequency, 86, at 125, there are 72 below the median 
and 14 above it. Hence, by interpolation by proportional parts, the 
median M is 


M = 122.5 + V8) = 126.69 pounds. (7) 


The accuracy of the result depends upon the usual assumption in such 
interpolations that the distribution is uniform in the interval from 122.5 
to 127.5. 

The mode. In the graphical representation of a frequency dis- 
tribution (Figs. 5 and 6), the mode is the abscissa of a maximum value 
of the frequency function. Experience has shown that a relatively 
large number of frequency distributions have only one mode, but some 
have two or more modes. 

The accurate determination of the mode is not usually a simple matter 
and belongs to a later chapter in which we determine the equations of 
frequency curves (Chap. VII). 

In the frequency distribution of weights (p. 21), it may be noted that 
the class frequencies increase up to 86 at 125 and then decrease. The 
mode is then probably in the neighborhood of 125, but all values from 
122.5 to 127.5 were placed in the 125-pound class. 

The point at which the frequency is most dense is the abscissa of the 
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maximum on the frequency curve, and can be determined accurately 
only from the equation of the curve. 

For a given grouping, the class mark of the maximal class frequencies 
is called the empirical mode to distinguish it from the mode defined 
above. Thus, 125 is the empirical mode in example 2. 

An approximation '! to the mode, measured from the lower limit of 
the modal class, is given by 

eat 
Ades fy 
where f is the frequency of the modal class, f_1 the frequency of the 
class next below, f; that of the class next above the modal class; and 
A and A? denote first and second differences. 
For the frequency distribution of example 2, 


f-1 = 56 
30 
f = 86 — 38 
2 
fi = 78 


Then x = 32 = .79 class interval units measured from the lower limit 
of the class 125. Thus, the mode is 


122.5 + 5(.79) = 126.45 pounds. 


As indicated by Pearson? this method should be used with caution 
because of the large probable error in the results. 


DISPERSION OR VARIABILITY 


Measures of dispersion. After finding an average of a set of observa- 
tions, it is usually important to determine the extent to which the values 
are scattered from this average. 

If the dispersion is to be measured by a single number, it is surely 
appropriate to use some kind of an average of deviations. The “ root- 
mean-square of deviations” and the ‘‘ mean of the absolute values ” of 
deviations are the averages which have been much used for this purpose. 

It is our purpose to show how these measures of dispersion may be 
most conveniently calculated, and to give a notion of their geometrical 
meanings in the description of a frequency distribution. 

Standard deviation. The standard deviation is the square root of 
the arithmetic mean of the squares of deviations of values of the variable 


1. Cauber, Die Statistischen Forschungsmethoden (1921), pp. 71-72. * 
2 Biometrika, vol. 1 (1902), p. 260. 
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from their arithmetic mean. The standard deviation is very commonly 
denoted by «. Thus, with the notation of page 24, 


ot = fi(X1 — XP? + fo( Xa — XP + + fa(Kn— XP (gy - 
N 


= 5S SX — XY (9) 


The calculation of o from this formula is likely to be much more 
laborious than the calculation from the equivalent formula 


St (Xi — Xo)" + fo(X2 — Xo)? ees t+ fa(Xn — Xo) _ b?, (10) 
N 


where X) is a selected class mark near the mean as defined on page 24 
andb = X — Xp. 

Graphical meaning of ¢ in a normal distribution. In case we have 
what is known as a normal distribution (see p. 11), the standard devia- 
tion is one half the distance between the points of inflection on the fre- 
quency curve. 

Form for the calculation of the standard deviation of a given frequency 
distribution. We use here the notation shown in the form on page 25. 
We add to the four columns of that form a fifth column headed 
f(X — X)?. This column is obtained in an obvious manner from 
columns (3) and (4). We add also a check column (6) headed 
f(X + 1 — X>)? formed in an obvious manner, by lowering the origin 
of deviations in column (8) by one class interval. 


Example 6. Finding the standard deviation of the frequency distribution of 
example 2. The columns (1), (2), (3), (4) are simply copied from example 5. 


(1) (2) (3) (4) (5) (6) 
x Joe Xo Kee A Xe) ef OC ef (Ae 
170 1 9 81 100 
165 1 8 8 64 81 
160 3 7 al 147 192 
155 3 6 18 108 147 
150 11 5 55 275 396 
145 22 4 88 352 550 
140 30 3 90 270 480 
135 45 2 90 180 405 
130 78 1 78 78 312 
125 86 0 86 
120 56 —1 — 56 56 

115 48 = — 96 192 48 
110 23 «3 — 69 207 92 
105 6 —4 — 24 96 54 
100 Hae ee Be —15 76 48 
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Uf(X — Xy)? = 2181. 
LUX — X,)? = 5.243 


bt = 225 

o? = 5.0181 

o = 2.240 class intervals 
= 11.20 pounds. 


Charlier Check 


2f(X +1 — Xo) = Sf(K — Xo)? + 2Af(X — Xo) + FH. 
2991 = 2181 + 2(197) + 416 
= 2991. 


Coefficient of variability. TheratioC = 5 of the standard deviation 


to the arithmetic mean is called the coefficient of variability and is 
usually expressed as a percentage. 

Mean or average deviation. The mean or average deviation of the 
values of a variable from an average is the arithmetic mean of the de- 
viations treating them all as positive. The deviations may be taken 
from the arithmetic mean or from the median, but the mean deviation 
is least when the median is taken as the origin or zero point from which 
deviations are measured. 

Graphical meaning of mean deviation of a normal distribution. When 
we have a normal distribution (see page 11), and the origin is at the 
arithmetic mean or median, the mean deviation is simply the abscissa 
of the centroid of the area under the right-hand half of the frequency 
curve. Furthermore, there is, in the case of a normal distribution, 
a simple relation between the mean deviation and standard deviation, 
Thus, 

Mean deviation = .7979¢ = .8o roughly. (11) 


Calculation of mean deviation from a given frequency distribution. 
We find first the sum of the absolute values of the deviations from the 
center of the class interval within which the average falls from which 
the deviations are to be taken, but we do not include in this the values 
in that class interval in which the average falls. 

We should next make corrections to this sum for the distance of the 
average from the center of the class interval in which the average lies, 
and for the fact that the observations in this class interval usually le 


1A correction, known as Sheppard’s correction, may be applied to this approxi- 
mate value of o? to correct for grouping into 5-pound classes. The method of mak- 
ing this refinement in the calculation will be given in Chapter VII. (See page 93.) 
The use of Sheppard’s correction is restricted to cases in which the frequency curve 
has high order contact with the z-axis at the ends of the range. 
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partly above and partly below the average. We shall refer to these two 
corrections as correction (1) and correction (2)! respectively. To be more 
specific, let us assume that the deviations are measured from the median. 

Correction (1). If Na is the number of observations above and Ny 
the number below the class interval in which the median lies, and c 
is the distance in units of class intervals from the center of the class in- 
terval to the median, we have as correction (1), 

HAAS = 7 Bey (N, — Nae. 


Correction (2). Let N,, be the number of observations in the class 
interval in which the median lies. If we assume these N,, values 
uniformly distributed over the unit interval, we have, by proportional 
parts, (.5-+c)Nm values below the median and (.5 — c) Nm» values 
above the median. With a uniform distribution over the unit interval, 
the sum of these deviations below the median would be 


(.5 +c)? Nm (6 — ¢)? Nm 

Faia! 2 

That is, the part of the area N,,, below the median ordinate is (.6 + c)Nm 

O+c¢ 
2 


and above the median 


and the AM of the deviations of its points from the median is 


The product of (.5 + c)N, and 5 t © is the sum of the deviations below 


the median ordinate. Similarly, the part of the area N,, above the 
median ordinate is (.5 — c)N,, and the AM of the deviations of its points 
dives 

oo 


from the median is The sum of the deviations of the Nm values 


is then 
2 ' = 
(.5 a. c) Nm + BON. =(.25 + )Nm 


Let X and f have the same meanings as the computation on page 24, 
and let X — Xo be the absolute value of X — Xo. For the frequency 
distribution of example 2, the median found in (7) is 126.68 and thus 
lies in the interval 122.5 to 127.5. In units of class intervals the 


median is 


126.69 — 122.5 _ 9 99 


of a unit above the lower bound of the interval and 0.338 of a unit above 
the center of the interval. 


1A less accurate method of finding the mean deviation than that presented here is 
given in certain current textbooks. Cf. G. Udny Yule, Introduction to the Theory 
of Statistics, pp. 145-46; H. Secrist, An Introduction to Statistical Methods, p. 396. 
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Example 7. Form for the calculation of the mean deviation from the median 
M = 126.68 pounds. x 


x Let Xe Xo) JX — Xs] 
170 1 9 9 
165 1 8 8 
160 3 v4 21 
155 3 6 18 
150 11 5 55 
145 22 4 88 
140 30 3 90 
135 45 2 90 
130 78 1 78 
125 86 0 
120 56 1 56 
115 48 2 96 
110 23 3 69 
105 6 4 24 
100 es 5 15 
416 717 
Correction (1): (Nz — Na)e = (186 — 194)(.338) = — 19.6 
Correction (2): (.25 + c?)Nm = (.25 + .114)86 =+4+ 31.3 
Sum of all deviations = 728.7 
Mean deviation = ued = 1.750 class intervals 


= 8.75 pounds. 


' Quartile deviation — Semi-interquartile range. The lower quartile 
Q; is such a value of the variable in a frequency distribution that one 
quarter of the total frequency is below Qi, and three quarters above it. 
The upper quartile Q; is such a value that three quarters of the total 
frequency is below Q; and one quarter above it. 

The quartile deviation or semi-interquartile range is 
Q3 — Qi : 
Sage (12) 
The quartiles, like the median, may be calculated from the frequency 
distribution by interpolation. Thus, in example 7, the value with 41% 
= 104 values below it is in the interval 117.5 to 122.5. Below the lower 
bound of this interval, there is a frequency 80. Hence, by proportional 


parts, Q; is above this boundary by mess X 5 = 2.14. 
Hence, we have Q, = 117.5 + 2.14 = 119.64. 
Similarly, we have Qs = 132.5 + a X 5 = 133.83. 


Formulas for probable errors in certain averages. The meaning of 
the probable error in a statistical result obtained from N values of a 
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variable is presented in a later chapter,! but for convenience we give 
here formulas for probable errors of certain averages and functions of 
averages. 

Let PE be the probable error, ¢ the standard deviation, C the co- 
efficient of variability, the subscript of PE the average or other statisti- 
cal result whose probable error is to be given. For a normal? distribution, 


PE)in = + 6745 ——, 
( Jam /N 


(PE)« = ae oa 


PE) median = + 8454 —~, 
(PE) mea Ua 


(PE) quartite = + .9191 Se 


(PE) sems-interquartile range = + .5306 Jit 


For a normal distribution, 
C 28 
——} ] 
Jen +?(i00) | 
273 
= + 4769 —|1+2 
°F | of Galt 


The appropriate average to use. The selection of the average which 
is most appropriate for describing a distribution of values depends much 
on the nature of the distribution and on the purpose for which the average 
is obtained. For example, if we wish the average cost of a set of N 
articles purchased at various prices, the most obvious and usual purpose 
would be to find that equal price per article which would give the same 
total cost as the given variable prices. Clearly, the arithmetic mean 
satisfies this condition. 

. Next, if we wish the average annual increase in population for ten 
years when we have given the rates of gain of each of the ten years, the 
main purpose would be to get the equal rates of gain r which would lead 
to the same result in ten years as the variable rates ri, 11,-°+ rip. 
Clearly, we must then find the geometric mean 1 +r in the equation 


Chien) 2 eC tary) (ara) ia Patia)s 


1 Chapter V, “Random Sampling.” 
* For meaning of a normal distribution, see Chapter VII. 


(PE)¢ 
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Again, if we require for our purpose the middle value of a group, the 
median is obviously the average we should find. In another situation, 
we may require the most frequent value. Then the mode is obviously 
the average we should find. 

While it is thus fairly obvious in some cases which average to use, it 
must be granted that for other cases, it is not at all clear, from the 
nature of the data or distribution, which average will serve the purpose 
best. In such cases, more than one average may well be used. For 
certain types of distribution, two or more averages give a much better 
description than any single average. 

If various averages have been computed, but only a single one is to 
be reported, it appears reasonable that some preference should be given 
to the average that has the smallest probable error. 

Skewness. The character of a frequency curve (Fig. 6) with respect 
to symmetry or skewness abeut the maximal ordinate is very commonly 
involved in an elementary description of a frequency distribution. 

It is important, therefore, that we should have some means of measur- 
ing skewness. In a symmetrical distribution, the arithmetic mean, 
mode, and median coincide. In askew distribution they do not coincide. 

Since skewness relates to the shape rather than to the size of the curve, 
it seems appropriate to use a ratio as a measure of skewness. The ratio 


arithmetic mean — mode 


skewness = —- 
standard deviation 


suggested by Karl Pearson has been much used to measure skewness. 
Skewness is also indicated when the quartiles, or pairs of deciles, are not 
equidistant from the median. With this fact as a basis, a measure 


upper quartile + lower quartile — 2 median 


skewness = upper quartile — lower quartile 


is sometimes used to measure skewness. 

The elementary methods presented in this chapter for describing fre- 
quency distributions by means of averages and measures of dispersion 
characterize only certain features of the distribution. These methods, 
as well as the simplest freehand graphical representations of frequency 
curves, point to the desirability of more complete descriptions of the 
distributions based on interpolation theory and curve fitting. Chapters 
III, IV, and VII deal with these topics. 


CHAPTER III 


INTERPOLATION, SUMMATION, AND GRADUATION 
By JAMES W. GLOVER 


UNDERLYING IDEAS OF INTERPOLATION 


Interpolation by differences. It is often required to interpolate 
additional ordinates in a given series of statistical or functional values 
represented by a set of ordinates. These interpolated values must of 
course closely approximate the true values of the ordinates when they 
are analytical functions of a variable abscissa, The interpolated values 
are usually assumed to lie on a parabolic curve passing through the ends 
of the given ordinates. For example, wo, u1, Ue, us, determine uniquely 
the parabola 
Uz = A + aie + ae? + a;2? (1) 
of the third degree. The coefficients a, a1, a2, and a3, are functions of 
the given ordinates and their differences. When x takes the values 
0, 1, 2, 3, the function wu, takes the values wp, 1, we, U3, respectively. 
If x is given a value between 0 and 1, the function u, takes the value 
assigned to the corresponding interpolated ordinate. 

The coefficients in (1) could be determined by substituting and solv- 
ing the equations 

Up = A, 

U1 = d + a1 + a2 + Qs, 

Uz = + 2a, +4 a2 + 8 as, 

Us =  +34a,+ 9 ae + 27 as. (2) 


However, by employing the methods of finite differences expressions for 
uz are obtained with less labor. 

Let uz denote a function of the variable z and consider a succession of 
equidistant ordinates, that is, separated by equal intervals (taken as 
unit intervals) along the z-axis: 


"++ U_s, U2, U1, Up, U1, U2, Us, °° * (3) 
Then by definition the successive differences are 
Au_3 =U_2—U-_3, Au_2=U_1—U_2, DEG Au2=U3— Us, Rene 


A’u_s=Au_2,—Au_3, A’u_2= Au-1—Au_e, +++ A’u;z=Aus—Aus,+++ (4) 


and so on. 
34 
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Horizontal difference table. Having given a succession of values of 
a function, the process of differencing may be repeated and a horizontal 
table of differences is developed as follows: 


TaBLE I. Horizonrat Dirrerence TABLE 


The column headed x gives the values of the variable 2, sometimes 
called the argument. The remaining columns give the values of the 
function and successive differences. The differences in the same row 
with the function have the same subscript as the function and are 
called the leading differences of that value of the function. 

Fundamental relations in difference table. The truth of the follow- 
ing statements is evident from the construction of the difference table. 

(a) The columns to the right of any given column contain the suc- 
cessive differences of that column. 

(b) The first column to the left of any column, when differenced, 
will produce that column; the second column to the left when difs 
ferenced twice, will produce that column, and so on. 

If we define A~u, as a function whose first difference is u,z, then the 
columns to the left of any column are (apart from arbitrary constants 
and periodic functions of the variable) the A~'u,, A~*uz, A~*xz, and so 
on, of that column. 

(c) If a column is inverted and the successive differences formed, 
the even differences are not affected and the odd differences are changed 
in sign only. 

(d) The sum of any two successive differences in any row in a hori- 
zontal difference table is equal to the difference in the next row directly 
under the first difference. In symbols: 

Atu, + Au, = A*uct. (5) 

These three differences lie in pairs in a row, in a column, and in a 


diagonal. | 
(ec) The sum of a series of successive terms in any column of a hori- 
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zontal difference table is equal to the following term in the next row and 
in the column to the left minus the term in the same row with the first 
term and in the column to the left. In symbols: 


"EUs AnMteg — AMta = AM, | 3? (6) 
z=a 


INTERPOLATION FORMULAS 


Newton’s interpolation formula. It is possible to express the func- 
tion wv; in terms of the function and leading differences of any row. For 
example, 


= Up + tAty + reA%Ug + tzA8up + r4Atug + xpA5ug +++, (7) 
where 
ea ae Dove eth. (8) 
In terms of the horizontal difference table, the values in any row are 
employed to obtain the value in the first column corresponding to the 
argument x. This is known as Newton’s interpolation formula. 


Example 1. The values of : ; 
UW = alt eF/2 dy (9) 


27 


the area under the normal curve of error to the right of the origin, are given at 
intervals of 4 from t = 0 to t = 4 as follows: 


t Ut t Ut t Ut 
.00 -00000 1.50 43319 3.00 -49865 
.50 .19146 2.00 47725 3.50 49977 

1.00 34134 2.50 -49379 4.00 -49997 


Find the value of the area under the curve by Newton’s interpolation formula 
when ¢t = 1.22. 

The horizontal difference table, with the intervals between os ordinates 
taken as unit intervals, assumes the following form: 


TaB_eE II 
poss Dil ei sole Leelee aS ute eles ee 
t A®u, Aus 
.00 — .01645 .02670 
.50 .01024 .01003 
1.00 4 .02027 — .00444 
1.50 A A .01584 
2.00 2 4 | 
2.50 A A 
3.00 A 
[2 £££ 
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Substitution in (7) gives ~ 
U.u = 34134 + .44(.09185) + (— .1232)(—.04779) 
+ .06406(.02027) + (— .041)(— .00444) 
= .34134 + .04041 + .00589 + .00130 + .00018 = .38912, 


The true value is .38877, hence the error is .00035. 


Diagonal difference table. By (5) any term in the horizontal dif- 
ference table can be expressed in terms of two differences in the row 
above, or the column to the left, or the diagonal below. In this manner, 
Newton’s formula can be transformed by successive steps into other 
forms. These steps will perhaps be better understood by writing the 
horizontal difference table in the diagonal difference table form. 


TaBuLe III. Driaconau Dirrerencre TABLE 


ae Uz Auz A’uz Asuz A‘uz A®uz A®uz 
—3 uU_3 
Au_3 
—2 U_2 A*u_s 
AUu_2 Aeu_s 
-1 U1 A*u_e Atu_s 
Au_y A®u_e Abu_s 
0 uo A*u_y Atu_o A®u_s 
Auo A®u_y Abu_e 
1 U1 Au Atu_y 
Au, A®uo 
¥. U2 Au 
Ate 
3 U3 


The diagonal table may be thought of as obtained from the horizon- 
tal table by turning the line of leading differences in any row about the 
function at the beginning of that row as a pivot through a downward 
angle until the even differences lie in the first, second, and so on, rows, 
respectively, below their original positions. In the new positions the 
leading differences of a function lie in the downward diagonal setting 
out from the function. Accordingly Newton’s interpolation formula for 
uz expresses it in terms lying on a downward diagonal in the diagonal 
difference table. 

The interpolation formulas of Gauss and Stirling’s central difference 
interpolation formula. By employing (5), 

A’u, can be expressed in terms of A’u_;, A®u_1; 


A®u in terms of A®u_1, A*u_2, A°u_a; 
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and so on, with the result: 
Ur = Up + tAuy + t2A?u_s + (x + 1)3A*u_1 
+ (a + 1)sAtu_e + (@ + 2)sA®*u_e2 + ->> (10) 


This interpolation formula is due to Gauss. It employs the odd — 
differences just below the central line from uw and the even differences 
on the central line. 

Another interpolation formula of Gauss may be derived in a similar 
manner; it employs the odd differences just above the central line 
from uw and the even differences on the central line, and is expressed 
as follows: 

Uz = Uy + cAu_i + (© + 1)2A?u_i + (a + 1)3A%u_2 
+ (@ + 2)Atue + @ + 2) Abus + >> (11) 


The mean of (10) and (11) is 
me Auer + Au) 4 2? no 
Us = w + 2( 3 +5 au 


+ @ + 1)o( PEA) 42 + 1)Atws 


Abu_s t =) +5 (x + 2)sA%u_s-+ +++ (12) 


This is the well-known central-difference formula, sometimes called 
Stirling’s formula. Proceeding from the function w% in the diagonal 
difference table, it employs the mean of the odd differences above and 
below the central line and the even differences on the central line.! 


+@ + 2)ef 


Example 2. When Stirling’s interpolation formula is applied to Table II, the area 
derived for (9) when ¢ = 1.22 is 


thas = 34134 + eee a .0968(— .05803) 


waves De + (— .0065)(.02670) 


= .34134 + .05318 — .00562 + .00018 — .00017 = .38891. 
Error = + .00014. 


Bessel’s interpolation formula. If uz is obtained from Newton’s 
formula by setting out from w, after inverting the columns of the diagonal 
difference table, the result is to express u, in the diagonal running up- 
ward and to the right which sets out from u; and passes through Aw. 
The formula is 


Uz = Ui + (@ — LA + 22A2u_1 + (@ + 1)sA*u_e 
+ (@ + 2),Atu_s + (x + 3);A®°uy~+--- (13) 


1 William Chauvenet, Spherical and Practical Astronomy, vol. 1, pp. 79-85. 
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and if the mean of this and Newton’s formula is taken, Bessel’s formula 
is obtained.! 


Uz = U + tAw + (Sta St) F a,( 2 ee 5)Atu-1 
+@ + (AE Br) 4 Fe + Ie faut +++ (14) 


To obtain the terms in Bessel’s formula from the diagonal difference 
table draw lines just below w and above u; and take the odd differences 
inclosed between these lines and the mean of the even differences just 
above and below these lines.? 


Example 3. Bessel’s interpolation formula gives the area (9) under the normal 
curve of error when ¢ = 1.22 as follows: 


U.a4 = 134134 + .44(,09185) + (— 1234) ) 


+ .00246(.01024) + .02306( StS S207") 


= 34134 + .04041 + .00652 + .00003 + .00042 = .38872. 
Error = .00005. 


The reader is referred to the treatise of Rice® for a detailed explana- 
tion of difference tables and proofs of the interpolation formulas of New- 
ton, Stirling, and Bessel. This book contains numerous applications 
of these formulas to problems in practical astronomy and will be found 
very helpful and suggestive. 

Everett’s interpolation formula. Another formula to which atten- 
tion should be called was proposed by Everett.‘ 


= €up + (E + 1)3A2u_it(é + 2)sAtu_s + (€ + 3),A%u_3 + - 
“a aur + (© + 1)sA?uy +(e + 2)sAtu_i + (@ + 3),A%u_2 + - (15) 
where é=1—z. 


In this formula only even differences occur and they lie along hori- 
zontal lines drawn through w% and w, respectively, in the diagonal dif- 
ference table. 

Since 


fp + ru, = U& + rAu, 


1Elias Loomis, An Introduction to Practical Astronomy, pp. 202-07. 

2 William Chauvenet, loc. cit., pp. 86-87. 

’ Herbert L. Rice, The Theory and Practice of Interpolation, Chaps. I and II, 
pp. 1-95. 

4J. D. Everett, “On a New Interpolation Formula,” Journal of the Institute of 
Actuaries, vol. 35, pp. 452-58. 
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this formula may be written in the following convenient form for com-~ 
putation : 
Us = Uy + thu + (e+ 1)A'u + (E+ 1)sA?u-1 
+ (x + 2)sAtu_r + (€ + 2)sA*u_e 
+ (x + 3),A%u_2 + (E+ 3);ASu_s +--+ (16) 


Example 4. Everett’s interpolation formula gives the area (9) under the normal 
curve of error when ¢ = 1.22 as follows: 


Usa = .84184 + .44(.09185) + (— .05914)(— .04779) 
+ (— .06406)(— .05803) + .01125(.01003) + .01181(.02670) 
= 34134 + .04041 + .00283 + .00372 + .00011 + .00032 = .38873. 


Error = — .00004. 


The choice of interpolation formulas. Where possible a central- 
difference formula should be used which involves the ordinates preced- 
ing and following the ordinate uw selected as the origin. No hard-and- 
fast rule can be laid down as to the choice between central-difference 
formulas, although it may be remarked that in practice Everett’s and 
Bessel’s formulas are usually preferred when the interpolated ordinate 
is for a value of z in the interval .25 to .75, particularly for values near 
.50. When x = .50, the coefficients of odd differences vanish in Bessel’s 
formula. 

Newton’s formula should be used only when interpolated values are 
required near the beginning of a set of tabular ordinates. When the in- 
terpolated values desired are near the end of the set, Newton’s formula 
may be applied to the ordinates inverted in order. 

All these formulas express an ordinate at one point in terms of the 
ordinate at another point and the distance x between them and certain 
differences of the ordinates near the ordinate from which the inter- 
polation sets out. The formulas have been illustrated with respect to 
the diagonal difference table, but it is a simple matter to pick out the 
corresponding terms in the horizontal difference table, and this form is 
suggested as more desirable to use in practice. For example, in Newton’s 
formula, the coefficients are 1, z, x2, x3, and so on, and the differences 
are in the same row with the function from which we set out. 

In the first formula of Gauss, the coefficients are 1, x, v2, (x + 1)s, 
(x + 1), (e + 2)s, + - - with function and first difference in the same row, 
next two differences step up one row, next two step up another row, 
and so on. Simple rules for deriving the coefficients in most of these 
formulas are given by Chauvenet.! 

Tables of values to five places of decimals of the coefficients up to 


1 William Chauvenet, loc. cit., pp. 79-91. 
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and including fifth differences in Newton’s, Stirling’s, and Bessel’s and 
sixth differences in Everett’s formulas with the argument x varying by 
intervals of .01 from 0 to 1 are given by Glover.! Extensive tables to 
ten places of decimals up to and including sixth differences in Everett’s 
formula with the argument varying by intervals of .001 from 0 to 1 
are given by Thompson.? 


SYSTEMATIC INTERPOLATION BY CONTINUOUS METHODS 


Subdivision of intervals. The formulas already given are useful 
when only a few values are to be interpolated. When, however, a table 
is to be extended by the interpolation of many values, for example, nine 
new values between every pair of original values, it is better to adopt 
a continuous process. This is accomplished by expressing the leading 
differences of the subdivided minor intervals in terms of the leading 
differences of the origindl major intervals. The following example 
illustrates this process. 


Example 5. The values of the Gamma Function are given at intervals of .1 from 
1 to 1.5, both inclusive, and the continuous process is employed to interpolate nine 
values between 1.2 and 1.3. 

The horizontal difference table for the Gamma Function and major differences 
is as follows: 


Tasie IV 


Zz T(z) AT (z) A*T'(z) A'T' (x) MT (zx) AST'(x) 


-0001438 | — .0000376 


0 | 1.0000000 | — .0216593 | .0062411 | — .0007251 

a .9783407 | — .0154182 | .0055160 | — .0005813 | .0001062 
.2| .9629225 | — .0099022 | .0049347 | — .0004751 

.38 | .9530203 | — .0049675 | .0044596 

4] .9480528 | — .0005079 

6 | .9475449 


Subdivision into ten minor intervals by Newton’s interpolation 
formula. The leading minor differences are then calculated by the 
following formula: ® 


b¥up = aAu_e + bDA?u_2 + cA®u_2 + dA*u_2 + eA®u_s (17) 


1James W. Glover, Tables of Applied Mathematics in Finance, Insurance and 
Statistics, pp. 412-19. 

2A. J. Thompson, Table of Coefficients of Evereit’s Central-Difference Interpolation 
Formula. Tracts for Computers, Edited by Karl Pearson, No. V (1921). 

* George King, Textbook of the Institute of Actuaries, Part II, second edition, pp. 
441-53. 
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based on Newton’s interpolation formula for subdividing the middle 
major interval into ten minor intervals. The coefficients are given in 
the following table: + 


TABLE V 


d e 


— .0086625 . .00329175 
— .0002750 | — .00024750 
.0006500 | — .00023750 
.0001000 . 00002000 
.00001000 


These coefficients are obtained by using Newton’s formula? (7) to 
interpolate nine equidistant ordinates between wu and wi. The inter- 
polated values are then differenced to express the leading minor differ- 
ences of w% in terms of the leading major differences in %. Finally, 
employing (5), the leading major differences of uw are expressed in terms 
of the leading major differences of w_s. 

Having the value of the Gamma Function for the argument 1.2, 
namely, % = .9629225, and the first five leading minor differences at 
this point, the nine interpolated values are easily calculated. The 
leading fifth minor difference vanishes and the leading fourth minor 
difference is equal to 1 in the eighth decimal place. Taking the fourth 
minor difference as constant and employing (5), the interpolated values 
are easily obtained by successive addition, as shown below. Decimal 
points are omitted in the calculation. 


TaBLe VI 

T(x) 5T' (x) 6°T (x) oT (zx) T(z) 2 
1 — 62 5441 — 122785 96292250 1.20 
— 61 5379 — 117344 96169465 1.21 
— 60 5318 — 111965 96052121 1.22 
— 59 5258 — 106647 95940156 1.23 
— 58 5199 — 101389 95833509 1.24 
— 57 5141 — 96190 95732120 1.25 
— 56 5084 — 91049 95635930 1.26 
— 55 5028 — 85965 95544881 1.27 
4973 — 80937 95458916 1.28 
— 75964 95377979 1.29 


953020165 1.30 
—aaeSea@qwqoquuueeee ewes 
1 James W. Glover, Tables of Applied Mathematics (1923), p. 428. 
? Alfred Henry, Calculus and Probability (1922), pp. 23-26. 
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The interpolated values of the Gamma Function obtained by this 
method are correct to six places of decimals. 

Osculatory interpolation formulas. The interpolations in the interval 
1,2 to 1.3 were derived from the leading minor differences of 1.2, which 
in turn were derived from the leading major differences of 1.0. These 
leading major differences were calculated from the six ordinates of the 
function from x = 1.0 to 1.5. In a similar manner the interpolations 
in the interval 1.3 to 1.4 would be based on the values of the six ordinates 
from x = 1.1 to 1.6; the interpolations in the interval 1.4 to 1.5, on the 
six ordinates from x = 1.2 to 1.7, andsoon. The nine interpolations in 
each interval necessarily run smoothly because they lie on the same 
parabola, however, from interval to interval different parabolas are 
used and at their points of junction breaks in the continuity or smooth- 
ness of the interpolated values are likely to occur. 

This difficulty is overcome by what is known as osculatory interpola- 
tion. The coefficients in the osculatory formulas are determined by the 
condition that at junction points the successive interpolation parabolas 
shall have a common ordinate, slope and (in case of fifth differences) 
osculating circle. This is effected by making the first and second de- 
rivatives of each successive pair of interpolation parabolas equal, re- 
spectively, at the common ordinate, that is, at the junction points. The 
third and fifth difference osculatory interpolation formulas are: 


Ue = Ut (@ + Aust (@ + IAs +2 nae aah ot 


Uz = Ua + (x + 2)Au_2 + (@ +2)2A?u_2 + (x + 2) 3A8u_2 


+ (x + 2)Atus + eae ee (19) 
It will be observed that (18) is the same as Newton’s formula in 
terms of u_; and its leading differences except for the coefficient of the 
third difference ; similarly (19) is the same as Newton’s formula expressed 
in terms of w_2 and its leading differences except for the coefficient of 
the fifth difference. 
Fifth difference osculatory interpolation. The following example 
illustrates the method of continuous interpolation by (19). 


Example 6. The values of the area to the right of the origin under the normal 
curve of error are given by major intervals of 0.2 from ¢ = 0 to 1.2 to effect fifth differ- 
ence osculatory interpolation in minor intervals of .04 between the third and fourth 
ordinates and the fourth and fifth ordinates. The horizontal difference table is given 


in Table VII. 
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TaBie VII 

t x Us Auz Auz Auz Atuz Aus 
.00 | —2 | .000000 | .079260 | — .003098 | — .002739 | .000649 | .000170 
.20 | —1 | .079260 | .076162 — .005837 — .002090 | .000819 | .000035 
-40 0 | .155422 | .070825 — .007927 — .001271 | .000854 

.60 1 | .225747 | .062398 -— .009198 — .000417 

.80 2 | .288145 | .053200 — .009615 
1.00 8 | .341345 | .043585 
1.20 4 | .884930 


The leading major differences are used to compute the leading minor 
differences and with the latter the interpolated values are obtained 
by successive addition. The following table+ gives the values of the 
coefficients of leading major differences in any row to compute the lead- 
ing minor differences for the argument two rows below when six ordinates 
are given and the fifth difference osculatory interpolation formula is 
used for subdivision of the major interval into five minor intervals. 


Taste VIII 


When these coefficients are used with the leading major differences 
of u_s, the leading minor differences of wu are obtained; when they are 
used with the leading major differences of u_1, the leading minor dif- 
ferences of u, are obtained. Since w% and w; are known, the interpolated 
values in the intervals w to u1, and u; to we are obtained by continuous 
addition as shown in Table IX. 

The numbers in italics, .2257474 and .2881458 would have been re- 
produced exactly as .225747 and .288145, respectively, if all the fig- 
ures in the computed leading minor differences had been retained; this 
is quite unnecessary, however, as seven places of decimals in these dif- 
ferences is sufficient to insure accuracy to five places in the interpolated 
values. Similar remarks apply to the italicized number 95302015 in 
Table VI. 

1 James W. Glover, loc. cit., p. 428. 
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: TasLe IX 
a6QaNC03$5aee@q5qwququT00DWaQqQomnmnumumeee 
Ou Stu Buy by ou Uz tn t 


a ey ee See Sees 


-0000014 | .0000005 | — .0000186 | — .0002535 | .0146085 | .155422 
-0000019 | — .0000181 | — .0002721 | .0143550 | .1700305 
— .0000162 | — .0002902 | .0140829 | .1843855 
— .0003064 | .0137927 | .1984684 
-0134863 | .2122611 
2LETATL 


-0000003 | .0000012 | — .0000116 | — .0003323 | .0131663 | .225747 
-0000015 | — .0000104 | — .0003439 | .0128340 | .2389133 

— .0000089 | — .0003543 | .0124901 | .2517473 

— .0003632 | .0121358 | .2642374 

0117726 | .2763732 

2881458 


SCHOORNSD SCHORNO 
rary 
ras) 


| a oe oe oe oe 


References 1 on osculatory interpolation are given below for the reader 
who wishes to pursue the subject further. 

Subdivision into halves.2. The subdivision of an interval is sometimes 
performed by successively dividing the interval into halves. This process 
is called interpolation into the middle. When x = 3 in Bessel’s formula, 
the middle ordinate becomes 


sit) _ 17d + Aur), 3 (Atur + Atu_s 
ae gistto 1a) z( 2 )+ 128 2 ) fol. 


or in decimal form 
2 2 4 
Us = .5(uy + uw) — 1125(= t+ 2-2) i. .234375(t1 FAs), (21) 


1Thomas Bond Sprague, “Explanation of a New Formula for Interpolation,” 
Journal of the Institute of Actuaries, vol. 22, pp. 270-85. 

Johannes Karup, ‘‘On a New Mechanical Method of Graduation,” Transactions 
of the Second International Actuarial Congress, pp. 78-109. 

George King, “On the Construction of Mortality Tables from Census Returns 
and Records of Deaths,” Journal of the Institute of Actuaries, vol. 42, pp. 225-77. 

George King, “(On a New Method of Constructing and Graduating Mortality and 
Other Tables,” Journal of the Institute of Actuaries, vol. 43, pp. 109-84. 

James Buchanan, “Osculatory Interpolation by Central Differences: with an 
Application to Life Table Construction,” Journal of the Institute of Actuaries, vol. 
42, pp. 369-94. 

George J. Lidstone, “Alternative Demonstration of the Formula for Osculatory 
Interpolation,” Journal of the Institute of Actuaries, vol. 42, pp. 394-400. 

James W. Glover, “Derivation of the United States Mortality Table by Oscula- 
tory Interpolation,” Quarterly Publication of American Statistical Association, vol. 
12, pp. 85-109. 

United States Life Tables, 1890, 1901, 1910, and 1901-10, pp. 344-48 and 372-88. 

2 William Chauvenet, loc. cit., pp. 87-88. Herbert L. Rice, loc. cit., pp. 78-96. 
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Since the coefficients of differences of odd order in Bessel’s formula 
vanish when x = 3, the first two terms of the above formula give the 
middle ordinate correct to third differences and the first three terms 
correct to fifth differences. 

Formula (20) is frequently employed to halve intervals and in many 
textbooks is put in the form of arule. The rule may be stated as follows 
for the horizontal difference table : 

From the mean of the two given functions subtract 4 of the mean of the 
second differences in the same row and the row above. This result is correct 
to third differences inclusive. To obtain the value correct to fifth dif- 
ferences inclusive, add to the above result +3, of the mean of the fourth 
differences in the first and second rows above. 

By n successive applications of this formula the original intervals are 
subdivided into 2” intervals. 


Example 7. Subdivide the original intervals of .16 into intervals of .04 between 
.40 and .80 by two applications of the formula to the integral of the normal curve of 
error given in (9). 


TaBLE X 
Uz Auz uz Uz Auz Aus 

.063560 | .061956 -003086 || . 125516 | .029899 | — .000928 
.125516 | .058870 004342 || . -155415 | .028971 | — .001105 
165416 : -170028 
184386 | .054528 -005297 || . 184386 | .027866 | — .001204 
212252 ‘ 198463 
.238914 | .049231 005904 || . .212252 | .026662 | — .001346 — 
-264230 { 226742 
.288145 | .043327 -006156 || . .238914 | .025316 | — .001401 
-3810563 é 251744 
331472 | .037171 : -264230 | .023915 | — .001497 

: 276369 
368643 : 288145 | .022418 

-310563 


The interpolation on the left in Table X is made first, the italicized 
numbers being obtained. These numbers, together with the original 
numbers from .32 to .88 are then differenced again on the right and a 
second interpolation gives the additional italicized numbers. Only 
second differences are used in each application, but the results are good 
to third differences inclusive because the coefficient of the latter vanishes 
in Bessel’s formula when z = 3, Although only second differences and 
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six decimal places were used, the final results are correct within a unit 
in the fifth decimal place. 

Errors in tables. The detection of accidental errors in tables is facili- 
tated by constructing the difference columns. This is shown most 
easily by the following table containing the error e in the value of us. 


Taste XT} 


It will be observed that the error, e, is multiplied in the difference 
column of a given order by the binomial coefficients of the same order 
with alternating signs. Accordingly, when the correct differences ap- 
proach a constant value the succeeding difference column will show up 
a succession of alternating differences closely following in value the 
product of the error and the binomial coefficients of the order of differ- 
ences in that column. 

Ordinates not equidistant. All the interpolation formulas given so 
far are based on the assumption of equidistant ordinates. If the ordi- 
nates are not equidistant, it is still possible to determine a rational 
integral algebraic function which will pass through the ends of these or- 
dinates; the function so determined defines the interpolated ordinates 
for values of the independent variable. Let the ordinates be wa, us, ue, 
+ + +, and let 

uze=A+Be+Cr+De+:--:-> 
Then when a, b, c are substituted in the above equation, a set of linear 
equations is obtained which gives the required values of the coefficients 
BC, > 

This interpolation formula was given by Lagrange in the form 

(orb) (eee)  (e — 2) 
(=) (dit) (an) 
(¢ — a)(x —c)--: (4 — n) 
+ Gay — 0): O—2) 
he 


and it is usually called Lagrange’s formula. 
An inspection shows that the right-hand member is a ratioaal integral 
algebraic function of x of degree one less than the number of ordinates 


1 Herbert L. Rice, loc. cit., pp. 9-15. 


Uz = Ua 


(22) 


48 HANDBOOK OF MATHEMATICAL STATISTICS 


and it is also evident that it equals ue when x equals a, uw, when x 
equals b, and so on. 

This formula may be used to supply the unknown value of the argu- 
ment when the function is taken as abscissa, for example, when a loga- 
rithm is given to find the anti-logarithm. Here the abscissa z is the 
logarithm and the ordinate y the anti-logarithm. In short, any func- 
tional relation between two variables may be assumed to be determined 
by a parabolic curve for a small range and Lagrange’s formula used to 
determine the value of the unknown ordinate.! 


Example 8. Given the following table, to find the unknown ordinate correspond- 
ing to the abscissa .86614. 


TasLe XII 


Applying Lagrange’s formula, with 


a = .84270 Ue = .34134 
b = .85865 up = .35083 
c = .88997 Uc = .37076 
z = .86614 

Ur = .35544 Correct value uz = .35543 


Numerous writers have attempted to simplify the notation of the 
calculus of differences and present the various types of central and non- 
central formulas in a systematic and logical order. References? are 


1 Alfred Henry, loc. cit., pp. 51-53. 

2W. F. Sheppard, ‘‘Central-Difference Formule,” Proceedings, London Mathe- 
matical Society, vol. 31, pp. 449-88. 

W. F. Sheppard, ‘‘Central-Difference Interpolation Formule,” Journal of the 
Institute of Actuaries, vol. 50, pp. 85-89. 

Robert Henderson, ‘‘A Practical Interpolation Formula with a Theoretical Intro- 
duction,” Transactions of the Actuarial Society of America, vol. 9, pp. 211-24. 

8. A. Joffe, “Interpolation-Formulez and Central-Difference Notation,” Trans- 
actions of the Actuarial Society of America, vol. 18, pp. 72-98. 

S. A. Joffe, “Parallel Proofs of Everett’s, Gauss’s, and Newton’s Central-Differ- 


ence Interpolation-Formulz,” Transactions of the Actuarial Society of America, vol. 
20, pp. 423-29. 
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given to some of these papers and there will be found therein further 
references which practically cover this field. 

Rational integral algebraic functions. The nth difference of a rational 
integral algebraic function of the nth degree is constant and all higher 
differences vanish. Let 


Uz = A + Gye + aor? + +--+ 4,2". 
Then 


Au; = na,x2""! 


plus terms of lower degree and the truth of the above statement follows 
at once by successive differencing. The constant nth difference is evi- 
dently equal to a,| 1, or 0” a,| 2 when the interval from uw to u; contains 
» units of the argument of the function. Since 

Auz = Ust1 — Uz 
and ; 

A*u, = Aur41 so Auz aL 2 Uet1 = Uz 
and 

Atty, = A+ Afu,, 


it follows easily by mathematical induction that 
Aly = Usin — Mepn—1 1 Nelepn—2 t ++ + (— 1)". 
If z =0, we have " 
AM? = Un — Nthya1 + Naty + +++ + (— 1)"uo, (23) 


which expresses the nth difference of u° in terms of up and the n func- 
tions following tw. 

This formula ' is useful in interpolating one or more missing terms in a 
series of equidistant terms. For example, if one term is missing, a curve 
of degree n — 1 can be passed through the n terms which are given. 

The nth differences of this function vanish, hence the missing or unknown 
term is found by solving this equation for that term. 

If two terms are missing, a curve of degree n — 2 can be passed 
through the n — 1 terms which are given. The (nm — 1)th differences 
of this function vanish, hence two linear equations can be written as 
follows: 


Au Un—-1 — (n—- 1) Up—2+ (n _ 2)2Un—s ie a (— 1)"—u% pt 0, 
Au = Un — (n — 1) ttn H(m — 2)otn-2 0 + + (— IH = 0, 


(24) 


whose solution will give the two missing terms. The extension of this 
principle is obvious. 


1 Burn and Brown, Elements of Finite Differences, pp. 23-24. 
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DERIVATIVES IN TERMS OF DIFFERENCES 


Derivatives by Newton’s formula. It is frequently required to de- 
termine the derivative of a function in terms of its finite differences. By 
differentiating uz with respect to z as expressed by Newton’s and other 
interpolation formulas given, the first and higher derivatives can be 
obtained in terms of the differences of uz Applying this process to 
Newton’s formula (7), the following equations are obtained expressing 
the values of the several derivatives in terms of the differences of u%» and 
the abscissa x. The number of units in the interval ~ to wu: is denoted 
by ®. . a 


cottl, tos = Atig + et Atty + 88 Attn 
dz dx dx 


4 14 aay, 4 225 asy, 4 eters 
dx dx 


Hence, 
wus = Aly + (x — $A’ + (G2? & + 3)A%M 

+(GU—-—3x2 +352 — Z)At (25) 
+grat— te +ee—Frt Am +s: 

Similarly, 

wus = Au + (x — 1)A'm + ($2? —- 32 +H) A% 
+32 —2+i2—HAm+:-- (26) 

wus” = Muy + (x — 3)Atuy + (G2? — 22+ FAH +-°> (27) 


Example 9. Find the first and second derivatives of the function (9) when ¢t = 1.22. 
Direct substitution in (25) and (26) of the differences in Table 2 with z = .44 
and w = 4 gives for? = 1.22. 


uly = .18884 Uy = — .24284 


The values of the derivatives at the ordinate wu are found by setting 
x = 0 in (25), (26), (27), and the resulting formulas are 


wud = Aug — $A’ + FA%Uyp — FAM + ZA MO+-+- (28) 


wuy = A?uo — Pup + $4 Ato — SAbun + <-> (29) 

wug)’ = A¥ugo — §At*uo + ZA uo+ +++ (30) 

Ezample 10. Find the first and second derivatives at ¢ = 1 in the function de- 
fined by (9). 


: This value of ¢ corresponds to z = 0 in Table II and substitution of the differences 
in the wo row in (28) and (29) gives 


uo = .24724 uo’ = — .28852 
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Derivatives by Stirling’s formula. When (12) is employed, the fol- 
lowing formulas are obtained expressing the derivatives in terms of the 
differences, 


wu; = Molt ah Avo 4 pau + (Ja? — 2) (Meet Se t ata) 


+ Gat = vy a)Atua + (drat — gat + g(t EMM)... 1) 
wul! = A’u_y + x (MAE Sy (a? — yy) Atu_e 

+ (ja! — ga(SHe FOr) 4... (32) 

En Bee PANE theo ts ge ES oe) foe (33) 


Example 11. The first and second derivatives of (9) for t = 1.22 by (31) and (32) 
are ula = .18992 uti4 = — .23616 


When z = 0, Stirling’s formulas for the first three derivatives simplify 
to the following: 


wut nh Au_; + Aup ee (Ae - ast) 4 oo( a amt) 4 Pees (34) 


2 
wud? = Aur — py Aus +++: (35) 
Asu_) + A®u_y Sa) 
8,517 —. pt | pees eee 36 
wo Taree bog te ( 2 + ( ) 
Example 12. The first and second derivatives of (9) for t = 1 by (34) and (35) 
are uh = 24278 ul! = — 24104 


Derivatives by Bessel’s formula. When (14) is differentiated, the 
following formulas are obtained for the derivatives at the point x. 


2 2, 
wul, = Aug + (@ — (FAM) 4 Gat — ge + yas 


+ (2° — dat — pee $ yy) (THEA) 


+ (gy tt — Otay t — rey)A usa t::: (37) 
plies ta +(x —p)A Mus +(422-—fa— m( tat) 
+ (Ga — 32? + gy)Abus +: (38) 


hug! = Btu +(z — (APE) 4 gat gaan $+ (39) 


Ezample 18, The first and second derivatives of (9) for ¢ = 1.22 by (37) and 
(38) are ulus = .18968 U4 = — .22924 
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When x = 0, Bessel’s formulas for the first three derivatives may be 
written as follows: 


ou, = Au — (Se) +45 1 A®u_y 
0 
+. (= U2 + At Ht) — rip Aue $e: (40) 
Bates oat Ae p(t) + gy APu_s + +++ (41) 
2 2 
4 4 
sm bua — (AB “ 


Example 14. The first and second derivatives of (9) for ¢ = 1.22 by (40) and (41) 
are 


= .24138 ul! = — .23824 
Derivatives by Everett’s formula. When (14) is differentiated formu- 
las for the derivatives at the point x are obtained as follows, where 
é=1—~2: 
wuz = Au + (3 2 — 3) Au — (3 — §) Aur 
+ (gy at — gat yA — Get — FO + HA 
+ (hy 28 — py 2! + hy a? — phy) Abus 
— hy 8 — Ae SH hs @ — Ay) A*te-ss. (43) 
wut! = cA?u + axe 1 
+(%2° —} Shp i tea + %8 — ¢é)Atu_e 
Gh a! dy athabea)Ati atte mk Ot et ANCE) 
wut!’ = Au — A?us 
(5 a — S) Any — (4S — 4d) Ati: 
+ (gy 24 — § 2? + phy ASu_e — GE St — FF 4+ zy) AXu-s. (45) 
Example 15. The first and second derivatives of (9) for ¢ = 1.22 by (48) and 


(44) are 
Ulsa = .18956 Ura = — .22980 


When zx = 0, Everett’s formulas for the first three derivatives take 
the following form: 
wug = Aug — 4 A%uy — 4 A2u_1 
+ gy A*uai + aly Atu_e 


— zhy AXu_e — gb A®u_s. (46) 
wus! = Muy — qty Atu_e + yoy Sus. (47) 


wu!!! = Au) — A?u_y 
— tAtu_1 — f Attu. 
+ ry A8u_e — 3295 A®u_s. (48) 
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Ezample 16. The first and second derivatives of (9) for t = 1 by (46) and (47) 
are: 
us = .24166 us’ = — .24104 
In examples 9 to 16 the first and second derivatives have been computed to fourth 
differences inclusive by employing four different interpolation formulas. The re- 
sults are brought together for comparison in tabular form. 


TaBLe XIII 


True Value 0 
Everett’s 144 
Bessel’s 200 
Stirling’s — 492 
Newton’s .28852 | — 4655 ||. .24284 | — 1160 


It will be observed that central-difference formulas give better results 
than Newton’s and that the choice as between the former is in the order 
Everett, Bessel, Stirling. While this will not always be true, it may be 
regarded as confirming the remarks made on page 40 concerning choice 
of interpolation formulas. 

A very full treatment of derivatives of tabular functions in terms of 
differences will be found in Rice. 


SUMMATION FORMULAS 


Finite integrals in terms of differences. If the first column to the 
left of the column containing the function uz is expanded by Newton’s 
formula, the following result is obtained : 


Auz = Ao + Tilo + TeAuy + r3A?uy + -°> 
Taking into account the principle involved in statement (e) on page 35, 
we are led easily to the following expression for the sum of the terms in 
the u, column from z = 0 to x = 1, inclusive, 
Au, — A My = Aus) = We tutes + ue 
z—1 
= Dus = Tg + TeArlo + TsA?y + TAPuy + +++ (49) 


Similarly, the sum of n terms beginning with wa can be expressed as 


follows: 
a+n—1 
2uz == Ua + Wari °°? + Uopn-1 


Ma + Neha + ngA2a + NsiA'a +++ — (50) 
1 Herbert L. Rice, loc. cit., Chap. III, pp. 96-129. 
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In this manner the finite integral is expressed in terms of the finite 
differences. It is most useful when uz is a rational integral algebraic 
function, for then the differences eventually vanish. The following ex- 
ample will illustrate an application of the formula. 


Example 17. Compute a table of the first hundred values of the coefficient of 
A‘uo in (25), 
y=te—itev+ie-}, 
when the argument z proceeds by intervals of .01 from 0 to .99 inclusive, and check 
the results by (50). 
The function 2; is first computed for the four values .00, .01, .02, .03 and the first 
three leading differences of uo formed as in the following table. 


TaBLeE XIV 
A®x4 A°x5 Ax z 
.000001 — .000149 .00909183 — .25000000 .00 
— .000148 .00894283 — .24090816 01 
.00879483 — .23196533 .02 
— .22317050 .03 


Since x4, is of the third degree, A®z{ = .000001 is constant and 24 can 
be computed by intervals of .01 to z = .99 by successive additions. 
If these computed values are added at any stage in the work, the sum 
should check with the result obtained by substitution in (50). For 
example, the sum of the first hundred terms, setting a = 0 and n = 100, 


should be 99 
ex 100 wy + 100cA%m%> + 100;A2u% + 100,A%u% 
100(— .25) + 4950(.00909183) 
+ 161700(— .000149) + 3921225(.000001) 
= — .1675. 


When 2; is computed by successive additions, a simple check on the 
work up to any point could be obtained of course by an independent 
computation of the function. If, however, the table had been con- 
structed by computing the function independently for each individual 
value of the argument 2, or if it were not known how the table had been 
computed, the check by (50) would be simple and effective except for 
compensated errors. 

Lubbock’s formula. The formula proposed by Lubbock ! expresses 
a relation between the sum of a series of original and interpolated or- 


1J. W. Lubbock, “On the Comparison of Various Tables of Annuities,” Trans- 
actions of the Cambridge Philosophical Society, vol. 3, pp. 8321-41. Reprinted in Jour- 
nal of the Institute of Actuaries, vol. 5, pp. 277-92. 
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dinates, and the differences of the original ordinates between which the 
interpolations are made. It is written as follows: 
mn-1 
Dur = dot my tute tutuatess tu 
\et a n n 1+; 


m—! 
n 


= n(uo + Ur te? + Um-1) 
x bpd Lin ay eg 1 Atm — Au) + ay (Attn — Au) 


2 127 
— EO EY (Atm — Aug) + FDO RD (Aig — bt) 
= GH NGI = MEA + Deas, — ata) 
$ OTS Ot D A8thm — big) fo (51) 


The original ordinates which are given, together with the differences 
of the first and last ordinates, are in the right-hand member. It will be 
observed that » — 1 ordinates are interpolated between each pair of the 
original ordinates from the ordinate up to the ordinate um. If the func- 
tion to be summed vanishes for the ordinates beginning with wm, then 
Um and all its differences vanish and the formula reduces to the following : 


mn—1 


De = Mott titers tatu tees tus 
=0 n n n n n 


\ = nu + ur + ta + ++ + tn) — 2S + BH 


- n—1,, (n?—1)(19 n?—1),5 _ (n?—1)(9 n?—1) 
Sin ina 720n' 480 nS 
(n? — 1)(863 nt — 145 n? + 2) as 
3 60480 né - 
Bete 7 bint — Glink 2), fe ee 5 
24192 ni Sos (52) 


Formula (52) is also used when um approaches zero asymptotically 
because then uv, and its differences become small and may be neglected. 

Lubbock’s formula is very useful in practical calculations where it is 
required to insert a given number of ordinates between every pair of 
original ordinates and obtain the sum of the original and interpolated 
ordinates. 


Au 


Example 18. Given the value of the reciprocal of every fifth integer from 51 to 
131, to find the sum of the reciprocals of all integers from 51 to 100, both inclusive. 
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TABLE XV 
n | 1fn A A? A} At AS AS 
uo 51 | .019608 001751 | .000287 .000064 .000015 .000000 | — .000006 
ur 56 | .017857 001464 | .000223 000049 .000015 | — .000006 


us 66 | .015152 
ua 71 | .014085 
us 76 | .013158 
Us 81 | .012346 
uz 86 | .011628 
us 91 | .010989 
uo 96 | .010417 


u2 61 | .016393 | — .001241 | .000174 | — .000034 -000009 


uo | 101 | .009901 | — .000467 | .000042 | —.000008 | — .000001 .000006 | — .000016 
un | 106 | .009434 | —.000425 | .000037 | — .000006 ‘000005 | — .000010 
uz | 111 | .009009 | — .000388 | .000031 | — .000001 | — .000005 
uis | 116 | .008621 | — .000357 | .000030 | — .000006 
— .000303 


ui | 121 | .008264 
wis | 126 | .007937 
uis | 131 | .007634 


Substituting in formula (51), the sum of the reciprocals of the integers 
from 51 to 100 is found to be .688172, which is correct to six places of 
decimals. 

Values of the coefficients to eight places of decimals in Lubbock’s 
formula for the first six differences from n = 2 to n = 12 are given by 
Glover.! 

Woolhouse’s formula. The Euler-Maclaurin expansion expresses a 
relation between the sum of the initial, terminal, and interpolated or- 
dinates, the area under the curve as determined by the definite integral 
and the successive derivatives at the initial and terminal ordinates of 
the area in question. It may be written as follows: 


Sus = > (uy + an terete atu) 


1 du du, 
oar te a a (Re 
Sipe oa5 sy ila + te) 12 m?\ dx aa) 


1° dy dus 1 dug _ du, 
+ 700 mi m*\ dx? - dx 30240 mé \ dx it) BF (53) 


The initial ordinate is wu, the terminal is u,, and the interpolated or- 
dinates are at intervals of ‘ between uw and u,. The left-hand mem- 


ber of (53) is the sum of all these ordinates divided by m. 


1 James W. Glover, Tables of Applied Mathematics, p. 430. 
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If the extreme ordinate u,, and the derivatives of Uz at = w Vanish, 
(53) simplifies to the form: 


Tu, = 24x 1% ple dug 
Ip S za Z m” 12 m2 m? dx 
3, 
1 Odd 1 du 1 Bug 


720 m4 dzx® ~ 30240 mé mé dz® (54) 


By setting m = 1 in (53) and (54) and subtracting the resulting equa- 
tions from (55) and (54), respectively, the following important relations 
are obtained : 


Tu, = Tu, 5 


m: — 1 dup _ du, 
= (ie + u,) + 12m? \dzx a) 


ene 1 ut ne — 1 fag us 
= m a) + 39940 me 30240 m® \ dx dx® (55) 
a eae m? — 1dup_ m' — 1d u% 
= eee rte teil Hit He eT oie ae 
ms — 1 Buy Rae: 
+ 30940 m m® dx (56) 


These formulas express a relation between the sum of the ordinates 


separated by unit distance and the ordinates separated by Lith of a 
m 


unit ; the latter sum is expressed in terms of the former, the derivatives 
at the initial and terminal ordinates, and the number of intervals be- 
tween unit ordinates made by the interpolated ordinates, namely, m 

Formulas (55) and (56) are commonly referred to as Woolhouse’s 
formulas, probably because Woolhouse ! published a number of papers 
in which he developed the formulas and showed their application to 
important problems in life contingencies. 

Examples illustrating the application of the formulas of Lubbock and 
Woolhouse are given in the Textbook of the Institute of Actuaries, Part 
II, by George King, pp. 467-80, also in Calculus and Probability, by 
Alfred Henry, pp. 114-19. The above example 18 was chosen merely to 
illustrate the process. The value of these formulas is of course chiefly 
apparent where the direct calculation of the result would involve exces- 
sive labor. 

1W. S. B. Woolhouse, “On Interpolation, Summation, and the Adjustment of 
Numerical Tables,” Journal of the Institute of Actuaries, vol. 11, pp. 61-88, pp. 301- 


22; vol. 12, pp. 136-76. 
“On an Improved Theory of Annuities and Assurances,” Journal of the Institute 


of Actuaries, vol. 15, pp. 95-125. 
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GRADUATION AND SMOOTHING FORMULAS 


Graduation by averaging. When observed values are plotted and 
the ends of the ordinates joined by right lines, the broken polygon is 
frequently quite irregular. At the same time it may be known from the 
nature of the observed character that the variation is continuous. In 
such cases it may be desirable to graduate or smooth the ordinates so 
that the variations in the broken polygon are not so wide. The popula- 
tion by ages, always overstated at ages which are multiples of 5, is an 
example of this kind of irregularity. 

An obvious method of graduation is to replace each term, Un, in a 
series by the average of the term and its two adjacent terms, namely, 


$(tn—1 Fo Un 1 Un41). 
Denoting the process of summing in threes by the operator [3], this 
average may be written 3[3]u,. For example, 
[B]umy = ui + Uo + 1. 
[5]ue = Us + Us + Us + Uz + Ug. 
These operators may be applied in succession in any desired order. 
Averaging in fives, 
B[5]Un = $(Un—-2 + Una + Un + Unga + Un+2) 
Bi Un + (Un—-1 + Ungi) + (Una + Un+2)} 
= 31 + 71+ y2)tn, (57) 
if we define the operator y, as follows: 


Yilln = Un—k + Unthe 


The graduation may be improved by applying the operator }[5] again. 
The result is 


gs [5]n = ge {5 Un + 4(Un-a + Undi) + 38(Un—2 + Unis) 
a 2(Un—s + Un+s) + (aay + Un+4)$ 
g(5 +471 +372 + 2y3 + ys)ttn. (58) 


If the process of averaging is pautcttt a third time, the graduated 
term becomes 


ris15]?un = rhs} 19 Unt 18 (Un—1-+Un41) + 15(Un—2 + Un+2) + 10(un—s+ Un+s) 
+ 6(Un—4 + Un+4) + 3(Un—s + Un+5) -+- (Un—6 + Un+e)t 
=> res (19 + 18 71 ce 15 ¥2 oe 10 ¥3 + 6 4 + 3 Ys +. ¥6)Un- (59) 


It is easy to show that graduation by successive application of the 
operator [/] will leave an arithmetic progression unchanged. 
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When first differences are not constant, better results are obtained by 
employing more powerful smoothing formulas.! 

Woolhouse, Higham, and Spencer graduation formulas. Woolhouse 
proposed the first graduation formula of a more general type: 


Gy = yo5(25 + 2441 + 21 y2+ 7 ys + 3 ys — 2 y6 — 3 77) Un. (60) 


It was derived by assuming w, to lie on four parabolas of the second de- 
gree defined by the sets 


(Un—7, Un—2, Un+3), (Un—e, Un-1) Unt4)s 
(Un—4) Unti, Un+6)s (Un—3, Unt2, Un+7)- 


The average of u, and the four values of u, so obtained was taken as the 
graduated value. This formula leaves unchanged the terms of a series 
whose third differences are constant. It employs fifteen terms, seven 
on each side of the central term, to determine the graduated value of 
the central term: it can be condensed in its application to the following 
symbolic form : 


Sivan ' 
W = jog 110 — 3[8] hun. 


The graduation formula of Higham involves seventeen terms and is 
written 


(61) 


= Bl 513) - 62 

a = EF 118] — yabue. (62) 

Spencer’s twenty-one term formula is perhaps the one most frequently 

employed when the series to be smoothed contains a large number of 
terms. Its symbolic form is 


_ [5 [7] a 
Gs = 350 {1 + [3] y3}Un. (63) 


The formulas of Higham and Spencer leave unchanged a series with 
constant third differences.? 

The following example of graduation of population statistics of negro 
females in the original registration states, 1910, by Spencer’s formula 
illustrates the application of these methods. 


1Corneille L. Landre, Mathematisch-technische Kapitel zur Lebensversicherung, 


second edition, pp. 69-83. ae. 

W. S. B. Woolhouse, “Explanation of New Method of Adjusting Mortality 
Tables,” Journal of the Institute of Actuaries, vol. 15, pp. 389-410. 

2 J, A. Higham, “On the Adjustment or Graduation of Mortality Tables,” Journal 
of the Institute of Actuaries, vol. 23, pp. 335-52. 

J. Spencer, “On the Graduation of the Rates of Sickness and Mortality Presented, 
etc.,” Journal of the Institute of Actuaries, vol. 38, pp. 334-43. 
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TasLe XVI 


(8) (9) 


AGE PopuLa- 


INTERVAL] TION [5](7) zo! 5] (8) 


(1) 


15-16 | 3380 

16-17 | 3862 

17-18 | 3892 

18-19 | 4749 

19-20 | 4819 

20-21 | 5477 

21-22 | 5044 

22-23 | 5991 

23-24 | 6321 12562 

24-25 | 6675 12972 

25-26 | 7140 13278 | 6512 

26-27 | 6252 13269 | 6505 

27-28 | 5599 13042 | 6398 

28-29 | 6851 12490 | 6186 

29-30 | 5558 11898 | 5908 

30-31 | 7567 11159 | 5585 
_ 81-382 | 3570 10492 | 5274 

32-33 | 4971 9810 | 4991 

33-34 | 4127 9378 | 4780 

34-35 | 4332 9069 | 4634 

35-36 | 5856 9055 | 4558 

36-37 | 4001 9030 | 4504 

37-38 | 3521 9052 | 4445 

38-39 | 5106 8837 | 4921 

39-40 | 4077 8474 | 4129 

40-41 | 5946 7817 

41-42 |} 2053 7105 

42-43 | 3154 

43-44 | 2391 

44-45 | 2133 

45-46 | 3763 

46-47 | 1855 

47-48 | 1859 

48-49 | 2655 

49-50 | 2352 


Column (1) is the original population and Column (9) the graduated 
series. It will be noted that the smoothing process removes the ex- 
aggerated population returns at ages 25, 30, 35, and distributes the 
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excess to the other ages. This distribution is such that the quinquennial 
groups or sums in fives are not greatly changed. 

The important investigations in this field are found mostly in actuarial 
literature, probably because of the many useful applications arising in 
graduation and smoothing of observed data derived from experience on 
insured lives. References are given to a number of papers which may 
fairly be considered as bringing the treatment of this subject up to date.! 


1 Robert Henderson, Graduation of Mortality and Other Tables, Actuarial Studies, 
No. 4, pp. 23-42. 

George King, ‘On the Error Introduced into Mortality Tables by Summation 
Formulas of Graduation,” Journal of the Institute of Actuaries, vol. 41, pp. 54-96. 

George King, “Notes on Summation Formulas of Graduation with Certain New 
Formulas for Consideration,” Journal of the Institute of Actuaries, vol. 41, pp. 530- 
65. 

G. J. Lidstone, ‘On the Rationale of Formulas for Graduation by Summation,” 
Journal of the Institute of Actuaries, vol. 41, pp. 348-60, and vol. 42, pp. 106-41. 


CHAPTER IV 


CURVE-FITTING BY THE METHOD OF LEAST SQUARES 
AND THE METHOD OF MOMENTS 


By E. V. HUNTINGTON 
THE PROBLEM OF CURVE-FITTING 


Suprosze Y;, Yo,--- Y, are the ordinates of an empirical curve 
corresponding to the values 21, 2, +++ 2n; and let it be required to find 
a mathematical curve y = f(x, a, b, c,++~+) which shall represent the 
empirical curve as closely as possible. 


Srraicat Live 
Type or EquaTIon 


Abscissa Ordinate 
yi x log Y 
y= log x log Y 
y= 1/z Y 
Oy x z/Y 
y= x log (AY /Az) 
y= log x log (AY /Az) 
y=c+b/(e — a) L— Xo (x — X0)/(Y — Yo) 
y=c+2/(a + bz) z (x — 20)/(Y — Yo) 
y x log [A2Y /(Az)?] 
y x log [A*(log Y)/(Az)?] 
y , (Yur)/Ve (Yure2)/Yx 
y = e7(d cos bz + ¢ sin bz) (Yui) /Ye (Yur2)/Ve 
Yy = do + ax + aot? + +++ -+ a,x” | Here the differences of the nth order, 


A"Y, are constant. 


See Chapter VII. 


The normal curve, Pearson’s curves, 
other frequency curves. 


For methods of harmonic analysis, 
see Running, Lipka, etc. 


Periodic curves: 
y = b sin az, ete. 


: 1In (12) and (13) the values of z are supposed to be equidistant, and Yz, Yx41, Yy42 are consecu- 
tive values of Y corresponding to z,, Tet, TR42. If M?+-4 Bis positive, use (12) ; if negative, use (13) ; 
where M = slope and B = intercept of the plotted straight line on the Y-axis. 

62 


CURVE-FITTING BY LEAST SQUARES OR MOMENTS 63 


The problem of curve-fitting consists of two parts: (1) selecting the 
most convenient type of equation y = f(z, a, b, c,+ ++); and (2) deter- 
mining the parameters, a, b, c,- ++, in the selected equation so that, 
for a given set of values 21, x2, +++ 2,, the computed values, y1, y2, ° - 
Yn, Shall agree as closely as possible with the observed values, Yi, Yo, 

The selection of the type of mathematical equation to be used in any 
given case is usually the most perplexing part of the problem. The 
preceding table of the most common types may be of assistance.! 

Plot the points indicated in any row of the table; if these points lie 
approximately on a straight line, select the equation indicated. 

Equations (2) — (14) may also be written in the following straight-line 
forms, where the quantities in square brackets [ ] suggest the abscisse 
and ordinates of the points to be plotted, and the coefficients give the 
slope of the line and its intercept on the y-axis (see method 2, below). 


(2") flog y] = log b + (a log e)[z]. 
(3’) flog y] = loga + blog z]. 
(4’) [y] = a+ b[1/2]. 

(5’) [x/y] =a + b[z]. 

(6’) [log (dy/dx)] = log (ab) + (a log e)[z]. 
(7’) [log (dy/dx)] = log (ab) + (6 — 1)[log 2]. 
n [oe =X). _ a— 2% 1 ml. 
8) Eee cm we wh . wl 

, Sy) b(a + bao 
(9’) en = = (a + bx) + “OE a. 


(10’) [log (@y/da?)] = log (a*b) + (a log e)[x] and [y—be*] = d + c[z]. 
(11’) [log {d*(log y)/dx?}] = log { (log b) (log a)?/ (log e)?} + (log a) [x] 
and [log y — a* log b] = log d + (log c) [z]. 


! , Yri2] _ p(atoaz adr car) | Yet 
a [Me] — eee om ni] 
and [ye] = d + b [ee]. 
y. ai Fe z ear Yeu 
(13’) Forde: e242 +. (2 6% cos bAs)| * ] 


and eee ih d + c[tan ba]. 
cos bz 


(14) (For y = a + bz + cz’). Este (b + 2 cx) + cl[x — a]. 


1For detailed discussion, see T. R. Running, Empirical Formulas (1917), or 
J. Lipka, Graphical and Mechanical Computation (1918). 
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For the determination of the parameters a, b,c, ---, after the type of 
equation has been selected, several methods are available, as follows : 

(1) The method of selected points is perhaps the simplest, but should 
never be used except when only the crudest results are required. 

A curve with two parameters a, 6, can obviously be made to pass ex- 
actly through any two selected points, by properly choosing a and 6; a 
curve with three parameters can be made to pass exactly through any 
three selected points; etc. But the resulting equation may not fit the 
remaining points at all well, and the choice of the selected points is en- 
tirely arbitrary. 

(2) The graphical method (plotting a straight line as indicated in the 
table above) is very valuable not only as an aid to the selection of a 
suitable equation, but also as a means of determining the parameters. 

For example, in equation (1) y = a + bz, of the table, draw a straight 
line through the plotted points (estimating the best position by means 
of a fine thread, or a ruled piece of celluloid), and then measure the slope 
of the line, and its intercept on the y-axis. The slope gives directly 
the value of b, and the intercept the value of a. 

Again, in equation (2), y = be**, we have logi y = logiob + (.4343 a)a. 
Hence if logis y is plotted as ordinate against x as abscissa, the slope of 
the line will give .4343 a, and the intercept of the line will give logi }, 
whence a and 6 can at once be computed. (Note that logip e = .4348.) 

Similarly in the other cases in the table. This graphical method will 
give very fair results with a minimum amount of labor. 

(3) The method of averages determines the slope and intercept of the 
straight line in question, not by a graphical measurement, but by a 
simple arithmetical computation. 

For example, suppose two parameters, a, b, are to be determined. 
Substitute the given codrdinates in the equation of the straight line, 
thus obtaining n equations of the first degree in a and b as unknowns. 
Separate these equations into two approximately equal groups; in each 
group, add the equations together and divide by their number, thus 
obtaining two “‘average’’ equations of the first degree, from which the 
two unknowns, a and b, can at once be found. If there are three un- 
knowns, separate the given equations into three groups; etc. 

This method of averages is the one most often used in practice. A de- 
fect of the method lies in the fact that the manner of grouping the equa- 
tions is quite arbitrary, and the results will differ somewhat for different 
groupings. 

(4) The method of least squares (see below) is the standard method for 
determining the parameters whenever accurate results are required. — 
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If the selected equation is linear in the parameters, as in case of equa- 
tions (1), (4), and (14), the method, though laborious, is perfectly 
straightforward and always yields a definite result, 

If the selected equation is not linear in the parameters, but an expan- 
sion by Taylor’s theorem is possible, then the method will still yield a 
definite result, by a process of successive approximation (see below) ; 
but often only at the cost of excessive labor. 

In such cases it is customary to replace the given equation by the equa- 
tion of the straight line indicated in the table above, which will be linear 
in certain functions of the parameters; these functions of the parameters 
are then determined by the method of least squares, and the parameters 
themselves are computed from these results. It should be noted, how- 
ever, that this indirect process will not always give the same result as a 
direct application of the method. 

(5) The method of moments (see below) is a second systematic method 
of somewhat wider applicability than the method of least squares, since 
in many cases where the method of least squares would require ex- 
tremely laborious successive approximations, the parameters of the 
selected equation can be expressed explicitly in terms of the moments 
of the curve. 

Even when the process of finding the parameters from the moments 
requires the solution of a numerical equation by trial and error (as is 
the case with most of the equations in the table above) the labor is 
usually much less than if the method of least squares were employed ; 
for numerous examples, see Karl Pearson, ‘‘ On the Systematic Fitting 
of Curves,” Biometrika, vol. 1 (1902), pp. 265-303. 

The method is especially useful in the fitting of frequency curves, as 
will be seen in Chapter VII. 


DETERMINATION OF THE PARAMETERS BY THE 
METHOD OF LEAST SQUARES 


According to the ‘“‘ method of least squares,” the parameters a, }, c, 
in the selected equation y = f(z, a, b, c), should be so determined that 
the sum of the squares of the “ residuals,”’ namely 


{f(x1, a, b, c) oR Ff a7 [f (x2, a, b, c) a Y.)? ap: a5 PACES. a, b, c) re Val’, 


(where Y:, Y2,--+ Y, are the observed values) shall be a minimum. It 
is easily shown that the following working rules will meet this require- 
ment. There are two cases to be considered. 
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Case 1. Suppose the selected function f(x, a, b, ¢) is linear in the 
parameters a, b,c; that is, suppose 
f(z, a,b,c) = K+ Aa+ Bb+ Ce, 
where K, A, B, C are any given functions of x not involving a, }, c. 
First, write the n “ observation equations ” 


Aia+ Bib + Cie + Ki, - Y, = 0, Si 
A.a + Bob+ Co+ Ke— Ye = 0, “Ss 
A,a+ B,b+Crc+ Ki—- Y.=0, Sn 


where A, Bi, C:, Ki are the values of the known functions A, B, C, K when 
X=, and Y; is the observed value of Y when x = 2; etc.; whilea, b, c 
aretobefound. Thenumbers §,, Se, ---S, on the right are check num- 
bers inserted for use later; each S being the value of the expression 
on the left whena = 1,b = 1,c=1; thusS,= 4,+8i+%+hi-—Y1. 

Next, form the three “ normal equations”’ as follows: (1) Multiply 
each of the n equations by the coefficient of a in that equation and add; 
this gives the first normal equation. (2) Multiply each of the n equa- 
tions by the coefficient of b in that equation and add; this gives the sec- 
ond normal equation. (3) Multiply each of the n equations by the 
coefficient of c in that equation and add; this gives the third normal 
equation. In the customary notation: 


(1) [AA]a + [AB]b + [AC]e + [AK] — [AY] = 0. [AS] 
(2) [BA]a + [BB]b + [BC]c + [BK] — [BY] = 0. [BS] 
(3) [(CA]la + [CB]b + [CC]c + [CK] — [CY] =0. [CS] 


Here the square brackets are used in a special sense, indicating summa- 
tion, thus: 

[A A) = Ay Ay? + os iA 

[AB] = A,B, + A.B.+---+A,B,; 

[AY] = AiYy, + AY: o- Oats + VND go ete. 


The check numbers on the right are built up like any of the other co- 
efficients; thus: 

[AS] = AS; + AS. +---+A,S,, 

[BS] = B,S, of BW. - ats ot. Bisa: etc. 
If the computation is correct, the check number on the right should 


equal, in each case, the value of the expression on the left when a = 1, 
b=)c=1. 
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Finally, solve the normal equations for a,b, c, according to a systematic 
schedule which will be sufficiently clear from the example worked out 
below (for four parameters a, b, c,d). The check numbers on the right, 
except in the cases of the original equations (1), (2), (3), (4), are built 
up like any of the other coefficients; the control consists in the fact 
that in each equation the check number on the right should equal the 
value of the expression on the left when a = 1,6 = 1,c=1,d=1. 
By the aid of the check column, the computation may be controlled at 
the end of every line, or only occasionally, as preferred; or the whole 
check column may be omitted, except the final control at the bottom of 
the column. 

Example of the solution of normal equations. Given, the four normal 
equations, (1), (2), (3), (4); to solve for the four unknowns, a, b, c, d. 


(1) 153.000 a +6.285b ++ 2.485¢ — 27.831 d — 23.350 = 0 110.589 
(2) 6.285 a + 8.989 b + 4.037 ¢ — 0.426d + 3.697 =0 22.582 
(3) 2.485 a + 4.037 b + 23.616 c¢ — 3.504d + 9.556 =0 36.190 
(4) — 27.83la — 0.426 b — 3.504¢ + 9.080 d — 6.936 =0 | — 29.617 
Hence 
(A 1) a-+ .04107846b + .0162418¢ — .1819020d — .1526144 =0 | + = .7228039 
(A 2) a + 1.43023076 + .6423230c — .0677804d + .5882259 =0 |] + 3.5929992 
(A 3) a + 1.6245473 b + 9.5034205 c — 1.4100604 d + 3.8454728 = 0 | + 14.5633803 
(A 4) a+ .01530676b + .1259028c¢ — .38262549d + .2492185 =0]| + 1.0641730 
(A 1) —(A2) —1.38915236 — .6260812c — .1141216d — .7408403 =0 | — 2.8701953 
(A 2) —(A3) — .19431666 — 8.8610975c¢ + 1.3422800 d — 3.2572469 = 0 | — 10.9703811 
(A 3) —(A4) + 1.6092406b + 9.3775177 c — 1.0838055d + 3.5962543 = 0 | + 13.4992073 
Hence 
(B 1) b+  .4506930c + .0821520d + .5333039 =0 | + 2.066149 
(B 2) b + 45.601341 c — 6.907696 d+ 16.762577 =0] + 56.456222 
(B 3) b+ 5,8272938c — .6734888d + 2.2347524 =0} + 8.388557 
(B 1) — (B 2) — 45.150648 c + 6.989848 d — 16.229273 =0] — 54.390073 
(B 2) — (B 3) + 39.774047 c¢ — 6.234207 d + 14.527825 =0]| + 48.067665 
Hence 
(C 1) ce — .1548117d + .3594472 =0 | + 1.2046355 
(C 2) ce — .1567406d + .8652589 = 0] + 1.2085183 
(C1) — (C2) + .0019289d — .0058117 =0 |] —  .0038828 
Hence 
(D 1) d — 3.01296 =0]| — 2.01296 
ce ee ee ee ee 
From (D 1) From (C1) From (B 1) From (A 1) In (4) (check) 
.. d = 3.01296 c = + 466441 b = — .0482214 = + .0340559 — 20.4000 
— .359447 — .2475207 — .0017378 + .3532 
‘c¢= .106994 — .5333039 + .5480634 7 eae 
vb = — 829046 +.1526144 | 1 “¢'o36 


Pr! Bed .732996 0.000 
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Case 2. If the selected function f(z, a, b, c) is not linear in a, b, c, we 
may be able to find the required values of a, b, c, by the following process 
of successive approximation; but the labor of the computation may often 
become extremely great, or indeed prohibitive. Let do, bo, ¢ be a first 
approximation, obtained in any manner, and let Aa, Ab, Ac be the “cor- 
rections ’’ which must be applied; that is, let 


a= a+ Aa, b = bb + Ab, C=C + Ac. 
Then, if we write K = f(2, do, bo, co) and 
A. = f’, (Z; 00, Oo; Co); B = f's (a, ad, bo, Co), Cy 280 fs (Xie, DanC 
(where f’, denotes partial differentiation with respect to a; etc.), we shall 
have, by Taylor’s theorem : 


f(z, a,b,c) =K+A-+-Aa+B-Ad+C-Ac+-::- 
Since K, A, B, C are functions of x alone, this expression is linear in the 
corrections Aa, Ab, Ac, so that we may build, as in case 1, a set of n 
‘observation equations ” for Aa, Ab, Ac, leading to the three following 
“normal equations ”’ : 

[A A]Aa + [A B]Ab + [AC]Ac + [AK] — [AY] = 0. 

[BA]Aa + [BB]Ab + [BC]Ac + [BK] — [BY] = 0. 

[CA]da + [CB]Ab + [CC]Ac + [CK] — [CY] = 0. 


Solving these equations for Aa, Ab, Ac, we have a = ad -+ Aa, 
b = bo + Ab, c = c + Ac, as our second approximations. 

Using these values in place of a, bo, co, we may repeat the process, 
thus finding a third approximation; and so on, until the “‘ corrections ”’ 
become of negligible size. 


DETERMINATION OF THE PARAMETERS BY THE 
METHOD OF MOMENTS 


According to the ‘‘ method of moments,” the parameters, a, b, c, 


should be so determined that as many as possible of the ‘ moments ”’ 
of the mathematical curve shall equal the corresponding ‘‘ moments ” 
of the empirical curve ; the moments being taken about the y-axis, either 
as “moments of ordinates ” or as “‘moments of areas.” (For illustra- 
tive examples, see Chapter VII.) 

Moments of ordinates. In rough work, the nth moment of a curve 
may be taken as the sum of the nth moments of the several ordinates 
erected at the points 2, %2, +++ 2, in question; the nth moment of an 
ordinate y, being the length of the ordinate multiplied by the nth power 
of its distance from the y-axis. 
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Thus, in the case of the mathematical curve y = f(z, a, b, c), the suc- 
cessive moments of the curve, beginning with the zero-th, are 


m = X(yi), m, = X(xyi), Mm, = X(x7y;), M3 = X(x*y;),--° 


where the summation extends over i = 1, 2, 3,-+ +n. 
In the case of the empirical curve, the successive moments of the curve 
are: 


M, = Z(Y 3), M, = 2(«:¥;), M, = 2(x27Y 3), M; =e 2z(z#Y;), aoa 
Hence the equations for determining a, b, c are: 


Zf(xi, a, b,c) = 2(Y3), 
22x f(x:, a, b,c) = U(x.Y,), 
2a ef (x, a, b, c) = oH Gay oh EAB 


If the selected function f(x, a, b, c) is linear in a, b, c, these equations 
can be readily solved. If it is not linear, successive approximations may 
be used, as in the method of least squares. 

(It can be shown that in case the selected curve is a polynomial, 
y =a-+ bx + cx? +--+, the method of moments, in the rough form 
just described, will give precisely the same result as the method of least 
squares. ) 

Moments of areas. In refined work it is usual to define the moments, 
not as moments of ordinates, but rather, as moments of elementary areas. 

In the case of the mathematical curve, the element of area is ob- 
viously ydz, and the successive moments (between the limits + = 2 
and x = 2,) are given by the definite integrals 


My = ff yd, m, = | xydz, Me = | xydz, ms = | xeydx-:- 


Here m is simply the area under the curve; m;, is the first, or “ static,” 
moment of the area; mz is the second moment, or “‘ moment of inertia,”’ 
of the area; etc. 

In the case of the empirical curve, on the other hand, only a finite 
number of ordinates are given, and it is necessary to agree, first of all, 
on what shall be taken as the element of area. 

The simplest plan is to fill out the curve by constructing a series of 
rectangles, each of the given ordinates serving as the midordinate of 
one of the rectangles.! On this plan, assuming that the ordinates are 

1A more refined plan is to fill out the curve, section by section, by the use of some 
one of the formulas of interpolation (pages 36-45), and then to use the area under each 


section of this curve in computing the moments Mo, M,, M:, M;,:--. See Karl 
Pearson, ‘On the Systematic Fitting of Curves to Observations and Measurements,” 


Biometrika, vol. 1 (1902), p. 265. 
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equally spaced, at distance dr apart, the successive moments of the 
empirical curve are 
M)= >( Y Az), M,= 2(a;Y ;Az), M.= 2(x2Y Az), M3= 2(x#Y Az), ee 
The equations for determining a, b, c are then 


fie, a, b, c)dx bee x(Y,Az), 
fae, a, b, c)dx = Z(4;Y,Az), 
fee, a, b, c)dz = 2(a?7Y Az). 


If the selected function f(z, a, b, c) is linear in a, b, c: 
f(z, a, b, c) = K(x) + aA(z) + bB(x) + cC(@), 
and if the integrals f A(z)dz, etc., can be found (by mechanical quadra- 


ture if necessary), then these equations can be readily solved for a, ), c. 
If the function is not linear in a, b, c, recourse may again be had to the 
method of successive approximation. 
Moments about the mean. If the moments of a curve about an ar- 
bitrary y-axis are 


ms = fae, m, = | xydz, Mm: = | xydz,-+-, 


and the moments of the curve about a parallel axis through the center 
of gravity of the area are 


Mo = fydz, my =f — x)ydz, Me =f — x)*ydz,++> 
(where x | ydx = ft zydx), then the m’s can be computed from the m’s 


by the following relations: 

My =™Mp. 

m = 0. 

M2 = Me — m?/m. 

Ms ms — 3 mym2/Mo +2 m3 /me. 

Ms = M4 — 4 myM3/M + 6 m2m2/m? — 3 mit/m,?. 

Ms = Ms — 5 myma/mo + 10 m2ms/me — 10 mieme/me + 4 m5/mel. 

Me = me — 6 myms/mM + 15 m2ma/me — 20 m2ms/m + 15 mitme/me! 
— 5m °/m,°. 


For Moments of frequency curves, with Sheppard’s corrections for grouping, etc., see 
Chapter VII. 


CHAPTER V 


RANDOM SAMPLING 
By H. L. RIETZ 


INTRODUCTION 


STATISTICAL inferences relating to a class of individuals, say to a large 
population or race, are very commonly based on observation of part of 
the population taken at random from the large group. Such a part of 
the population is called a random sample. 

Random sampling can perhaps be illustrated most simply by repeated 
trials in a game of chance. Thus, suppose we make repeated sets of 
drawings of 100 balls from an urn, always with a constant probability 
4 that a ball to be drawn will be white. The number of white balls 
drawn per set of 100 will fluctuate about 25 as a most probable value 
(see Bernoulli’s theorem, p. 16). These chance fluctuations are often 
called fluctuations in random sampling. The determination of certain 
properties of the frequencies obtained in such drawings is one of the 
simplest problems of the theory of sampling (see p. 72). 

We may also illustrate random sampling by considering the determina- 
tion of the characteristics of a race, say we are to describe height or 
weight of the adult white male population of a country. It would be 
unnecessary to measure all the individuals to obtain a high degree of 
accuracy in the averages. We should almost surely attempt merely 
to measure an adequate random sample of individuals and to construct 
our science on the basis of results from the sample. To be-sure, the 
question arises: What is an adequate sample for a particular purpose? 
The theory of sampling throws some light on this question. 

To be more concrete, let us conceive of taking 1000 random samples 
of 1000 individuals each. Each of these 1000 samples would have its 
own arithmetic mean, median, standard deviation, and so on. These 
1000 arithmetic means would probably differ but slightly from each 
other in comparison with differences between extreme individuals, but 
if the measurements are sufficiently accurate, the means would form a 
frequency distribution. This frequency distribution of means would 
have its mean (a mean of means) and standard deviation. It is this 
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standard deviation in which we are especially interested, for it gives a 
measure of the variability of the means of samples. 

Along with the above introductory statement about the general 
meaning of random sampling, it should be emphasized that the existence 
of a sampling problem in a statistical investigation depends on the pur- 
pose of the investigation. On the one hand, there may be no random 
sampling problem involved because the purpose is merely to describe 
the facts about the sample. On the other hand, fundamental problems 
of random sampling arise when the purpose is to predict the properties 
of an aggregate, or to test a hypothesis about an aggregate by obser- 
vations on a sample. 


THE BERNOULLI THEOREM 


Applications of the Bernoulli theorem (p. 16) to random sampling. 
The treatment of random sampling is simplest when we are concerned 
merely with the fluctuations in frequency where the a@ priori probability 
p of success in each trial is a constant and where the trials are independ- 
ent in the probability sense. 

Under these conditions, the a priort most probable frequency distri- 
bution in N sets of 7 trials each is given by the terms in the expansion 
of the binomial 

N(p + )", where q = 1 — p. (1) 

The most probable number, m, of successes in a set of n trials ism = np 
when np is an integer,! and the standard deviation of the number of 
successes in sets of n trials is V npg. 

To find the probability that the number of successes 1; in any set 
of n trials will fall within a certain conveniently assigned deviation c 
from the most probable value np, would, in the simplest cases, merely 
involve evaluating certain terms in the expansion (1), but this method 
would, in general, be impracticable when is large. The probability 
in question is given by the theorem of Bernoulli. In making a random 
set of n trials with constant probability p of success in each instance, 
the theorem states that the probability that a deviation n, — np in the 
number of successes will not exceed an assigned number c or that the 


deviation of the corresponding statistical ratio ™ from p, will not exceed 
. e n 
, is given by 


ears 
ec 2npq 


Poi e€ rt bee Z 
V2 Taal V2 anpq pq g 
1In any case,np—-qSmenpt+p. | 


x 
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The application should be restricted to cases where p is not very small 
in comparison with q (or, of course, g with respect to p). Tables! have 
been prepared which indicate that the formula should hardly be regarded 
as practicable when p < .03. 

For the usual purposes of application to statistical problems, the sec- 
ond term in the right-hand member of (2) is so small in comparison 
with the first term that it may be neglected. Then we have 


By Tiree lat te 
I ef Se @ 2nredz. 3 
Taal (3) 


This approximation is simply equivalent to the result of assuming a 
distribution of frequencies in accord with the normal probability function 


lf 


1 =. 
4 TE =e me (4) 
where “o =Vnpq. 


Example 1. In the third edition of American Men of Science, by Cattell and Brim- 
hall (1921), p. 804, it is stated that the group of scientific men reported 716 sons and 
668 daughters. The data very naturally suggest the old question as to whether 
these data are at variance with a hypothetical sex ratio 3. In the language of proba- 
bility, the data suggest the question of simple statistical sampling. In throwing 
1384 coins, what is the probability that the number of heads will differ from 4384 = 
692 by more than 692 — 668 = 24 on either side? 

The probability in question is 1 — P., where we take c = 24.5 in (3). To find 
P, we may use the table of normal probability integral (table, p. 211), where we find, 
corresponding to a deviation 


@ 245 _ 13171, 
Cc 


NPY 
the value P, = .§12. Hence the probability that a random deviation will be greater 
than 24 is 1 — .812 = .188. This fact may be expressed roughly by saying that a 
deviation larger than 24 will occur in the long run slightly more than once per six 
trials. 


Probable error in the number of successes. For a normal distribu- 
tion of frequencies, the probable error or quartile deviation is that de- 
viation c on either side of the most probable value such that P, = 4. 
By reference to the table on page 210, it is found that 


when P,= 


1 Lucy Whitaker, ‘On the Poisson Law of Small Numbers,” Biometrika, vol. 10 
(1914), p. 41. 
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That is, the probable error of the frequency 7 is 


PE = 6745V npq, (5) 
and the probable error of the relative frequency 7 is 
PE = 6745 Ne (6) 


Example 2. Suppose we have 1000 students in a college, of which 250 are women 
and 750 are men, and that they have been assigned numbers 1, 2, 3,---1000. Let 
balls bearing the numbers be placed in an urn from which drawings are to be made 
atrandom. Let us draw 100 balls, one at a time at random, under the conditions that 
each ball is to be replaced so as to keep the probability of drawing a woman’s number 
equal to 3. By (5) the probable error of the number of women’s numbers drawn is 


PE = .6745V 100(4)(3) = 2.92. 


By (6), the probable error of the relative frequency of drawing a woman’s number 
is 
PE = 6745 4/G2G) = .o292. 
45 Wi 292 


We have thus far assumed a constant known probability p. In 
statistical practice, we are usually obliged to obtain an approximation 
to pfrom the data. We then assume that we can get a fair approxima- 


tion to p by finding a relative frequency of success p’ = — where 7 is a 
large number, and 7 is the number of successes. 


Ezample 3. Suppose we do not know the relative number of men and women 
in a large group, but we conceive of taking a random sample of 1000 consisting of 
300 women and 700 men. We have 3°°, asan approximationto p. Thatis, p’ = .3. 
This approximation is subject to the probable error 


6745 4/20 =. LBCTy S 
Ne 6745 | 21-7) = .0098, 


where qg =1-p’. 


* That is, it is an even wager that the true value p for the whole population is be- 
tween 0.3 — .0098 and 0.3 + .0098. 


Probability of deviation greater than certain small multiples of PE. 
The chances that the difference between a true value and the value ob- 
tained from a sample will exceed numerically, and that it will not exceed 
numerically an assigned small multiple of the probable error give a con- 
venient form in which to express the probability of certain statistical 
conclusions. These chances, as obtained from tables of probability 
functions (pp. 210-16), are approximately as follows: 


* 
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a 


ASSIGNED DEVIATION PROBABILITY OF ExceEpDING |Propasiuiry or Nor Excrerpina 
_PE oD 5 
2PE 177 823 
3 PE -0430 -9570 
4PE .00698 -99302 
5 PE -000746 -999254 
6PE -000052 -999948 


These values can easily be expressed in terms of odds in favor of or 
against assigned deviations (see Chapter VII, p. 100). 

In the application of probable error theory or other sampling theory, 
the fact should be much emphasized that the samples must be chosen in 


an unbiased manner, otherwise the use of the formulas Vnpq and 
24 is invalid. 
n 
THE BORTKEWITSCH “LAW OF SMALL NUMBERS” 


Application of Poisson’s exponential limit — the Bortkewitsch “ law! 
of small numbers.” As implied on page 73 there is good reason for re- 
stricting the applications of the Bernoulli theorem to cases where the 
probability is not very small, say not smaller than .03. Thus, in con- 
sidering events which happen rarely (p or q small) we use tables? of 
the Poisson exponential limit in place of the table of the normal probabil- 
ity function in the description of fluctuations in sampling. 


Example 4. According to the United States mortality statistics of 1920, there were 
9 deaths from measles per 100,000 of population. Let it be required to find the 
per cent of cases in which the number under random sampling would deviate more 
than 3 in excess of this number and the per cent of cases in which it would deviate 
more than 3 in defect of this number. 


From Tables for Statisticians and Biometricians, p. 122, there would 
be 12.42 per cent of the cases in excess more than 3, and 11.57 per cent 
of the cases in defect more than 3. Hence, in 23.99 per cent of the cases, 
we should get values deviating more than 3 from 9. 

From the Bernoulli theorem, we find that 24.33 per cent of the cases 
would deviate from the most probable by more than 3. Thus, the 


1]. Von Bortkewitsch, Das Gesetz der kleinen Zahlen (1898). 
2H. E. Soper, “Tables of the Poisson’s Exponential Limit,’”’ Biometrika, vol. 10 
(1914), pp. 25-35. See also Tables for Statisticians and Biometricians (1914), pp. 


113-24, 
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actual results in percentage of deviation above and below together 
differ by only .34 per cent, but Poisson’s exponential limit shows also the 
difference between the per cent of cases in excess more than 3 and the 
per cent in defect more than 3. 


THE PROBABLE ERROR 


~ Meaning of probable error in an average or other statistical constant.! 
The conception of fluctuations in sampling applied to simple frequencies 
and relative frequencies is easily extended to statistical results such 
as ordinary averages, moment coefficients, correlation coefficients, and 
soon. Consider the sampling fluctuations of an arithmetic mean as an 
illustration. For instance, let 


™;, Me, eee ™, 


be the arithmetic means of heights of random samples of ¢ sets of 1000 
individuals each of a well-defined class of men. The means 


my, Ma, ee “mm 


will form a frequency distribution whose standard deviation can be 
found in a form adapted to numerical calculation. This standard de- 
viation could be used as a measure of the fluctuations in sampling of the 
means. However, instead of using the standard deviation to measure 
the failure of the mean stature of the sample to agree with the mean 
stature of the large class of men, it is the usual practice to use the stand- 
ard deviation multiplied by the constant .6745, and to call this func- 
tion the probable error of the mean. When the means are normally 
distributed, this definition of probable error is equivalent to that 
given on page 73. 

We have discussed sampling fluctuations in arithmetic means, but 
the conception would apply if m1, me, - --m, were any other type of 
average or statistical constant obtained by random sampling. 

Although we have given elsewhere in this Handbook formulas for 
the probable error in certain statistical averages and coefficients, we 
shall for convenience of reference now collect together the formulas 
for the probable error in some of the most important statistical 
constants. 

In the following, n denotes the number of observations; m1, the num- 
ber of successes; o the standard deviation of the observations: 


_ 1 Karl Pearson, “On the Probable Errors of Frequency Constants,’ Biometrika, 
vol. 2, pp. 273-81. 
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Sratistican Constant 


Relative frequency p’ = = q=1-p)’ 


PPE OVCRIE SGU Cs ait 2” stairs) fa) amy oo) 
Median (normal distribution) . . ... 
Standard deviation (normal distribution) . 
Quartile (normal distribution). . . .. . 


Semi-interquartile range (normal distribution) 
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Coefficient of variation C (normal distribution) 


Second moment y2 about the mean (normal dis- 
tribution) 


Third moment ps3 about the mean (normal distri- 
bution) . ae pate it ; 


Fourth moment yu, about the mean (normal dis- 
tribution) oars. 


Coefficient of correlationr . 
Gorrelation ratioms-seen ei. bee sea 6 
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Regression coefficient r— Ae cay Bee 
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Tetrachoric correlation coefficient . 
Bi-serial correlation coefficient . . 
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8; for a normal distribution... .. - 


Vv Bs for a normal distribution . . . . 


ProBaBLe Error 


0.6745 4 [eis ; 
nr 

0.6745 o : 
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0.67450 _ 0.47690, 
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0.5306 ¢, 
Vn 
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a ane 
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0.6745 ¢yV 1 — 7°. 


See Tables for Statisticians and 
Biometricians, pp. xl-xlii. 
See Biometrika, vol. 10, pp. 884- 

390. 
See Tables for Statisticians and 
Biometricians, p. |xii. 
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Small samples. In finding probable errors and other measures of 
fluctuations in sampling, we have assumed that the number of individuals 
ninasample is large. These methods are trustworthy when the sample 
is large, but it is not clear where the boundary is to be drawn between 
large and small samples. A big problem of practical statistics arises 
concerning the variation and distribution of averages based on small 
samples. Considerable progress has been made recently with this prob- 
lem in relation to the arithmetic mean, standard deviation, and corre- 
lation coefficient obtained from small samples. All we shall attempt 
here is to direct attention to the work of Student,! Fisher,? and Pearson ® 
on this problem. 


THE TEST OF GOODNESS OF FIT 


Pearson’s test of goodness of fit. Given a frequency distribution 
which represents'a random sample taken from a large group. Let us 
assume that we know the a priori most probable frequencies or that we 
have fitted a theoretical frequency curve to such a distribution by 
methods of curve-fitting. The question arises as to the goodness of the 
fit of theory and observation. In considering this question, there is 
needed a criterion to test whether the theoretical distribution fits the 
observed distribution well or not. 

Pearson’s criterion or test of goodness of fit gives a method of de- 
termining whether an observed distribution is described by a theoretical 
curve or distribution to within fluctuations which may reasonably be 
ascribed to random sampling. 


Let us assume that we have n observed frequencies 
my’, me’, Cert Mr! 

and n corresponding theoretical frequencies 
m1, Me, ** * Mn. 


Pearson’s criterion of goodness of fit gives the probability that a series 
as likely or less likely than the observed series will arise in taking a 


1 Student, “The Probable Error of the Mean,” Biometrika, vol. 6 (1908), pp. 1-25; 
“Probable Error of the Correlation Coefficient,” Biometrika, vol. 6 (1908), pp. 302-10. 

2K. A. Fisher, “Frequency Distribution of the Values of Correlation Coefficient in 
Samples from an Indefinitely Large Population,” Biometrika, vol. 10 (1915), pp. 507-21. 

* Karl Pearson, ‘On the Distribution of Standard Deviations of Small Samples,” 
Biometrika, vol. 10 (1915), pp. 522-29. 

‘Karl Pearson, “On the Criterion that a given System of Deviations from the 
Probable in the Case of a Correlated System of Variables is such that it can be rea- 
sonably supposed to have arisen from Random Sampling,” Phil. Mag., vol. 50, series 
5 (1900), pp. 157-75. 
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random sample. To give a concrete illustration, consider the following 
simple case of the frequency distribution obtained ! by throwing 1536 
sets of 7 coins per set and noting the number of heads in each throw: 


Number of heads. 
Observed frequency 
Theoretical frequency 


OF |"e 4 
12 | 78 | 270 | 456 | 386 | 252 | 69 | 13 
12 | 84 | 252 | 420 | 420 | 252 | 84 | 12 


The important problem of sampling is to determine the probability 
in taking another sample, of obtaining frequencies which deviate as 
much or more than the observed sample from the theoretical frequencies. 
To give a precise formulation of the problem, we may state that the ex- 
tent of deviations is evaluated in terms of a function x? defined below 
as the sum of the ratios of squares of deviations to corresponding theo- 
retical frequencies. 

The method of application of Pearson’s criterion of goodness of fit 
may be stated as follows: “ 

Let XY = {ee me — mr, (1) 

t 
where m, and m’, are defined ee 

The probability that a random sample would exhibit as large or larger 
deviations from the theoretical than correspond to an assigned x is 


P= V2 fe Fat 2 8(X4 2 + Saget ngs rsp) ® 


if n is even, and 


bas x Sy Cask a PS TINE SOK 
Bestest ht te a) i ninodd. 


Ezample 5. Let us test the agreement of theory and observation as to the number 
of heads in the results of tossing sets of seven coins, where we have the observed and 
theoretical frequencies given above in throwing 1536 sets of 7 coins. 


m’' = | 12] 78 270 456 386 | 252 69 13 
m= |12| 84 252 420 420 | 252 84 12 
m’—m =|] 0] -—6 18 36 — 34] 0 — 15 1 
(m’ —m)?= | 0 | 386 324 | 1296 | 1156 | 0 225 1 


(m' —m)* _ | 9 | .429 | 1.286 | 3.086 | 2.752 | 0 | 2.679 | .083 
m 


Then x? = 10.315. 
Turning to Elderton’s tables (loc. cit., p. 26) for n’ = 8, we find P = .172 and we 


1 The coins were thrown by students in a class in statistics taught by the writer. 
2Tables of the values of P by W. P. Elderton are given in Biometrika, vol. 1, 
pp. 155-63, and in Tables for Biometricians and Statisticians, pp. 26-29. 
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conclude that in the long run, approximately 17 times per 100 trials will give devia- 
tions as great or greater than those observed. 


The theory which underlies the testing of the agreement of theory and 
observation in statistical practice is usually more complicated than 
that involved in example 5 because the theoretical values are not a 
priort most probable values, but are determined by fitting a theoretical 
distribution to the sample. This theoretical distribution is usually 
some mathematical function suggested by the theory of probability. 

Let us assume that we have determined x? and P for such a case by 
treating the theoretical values obtained by fitting the sample just as if 
these values were the a priort most probable values. In this connec- 
tion it is fair to say that judgment as to goodness of fit is based on the 
general order of magnitude of P, and not on small differences in its value. 
After making use of a certain amount of approximate mathematical 
analysis Pearson ! concludes when we are dealing with sufficiently large 
numbers to give small probable errors that: (1) if a curve is a good fit 
to a sample as judged by the x? test, to the same fineness of grouping, it 
may be used to describe other samples of the same population; (2) if 
the curve is a bad fit to a sample, then the curve cannot serve to the 
same fineness of grouping to describe other samples from the same pop- 
ulation. That is, we have in the x? test a criterion to determine whether 
a given form of frequency curve describes a sample with a certain de- 
gree of fineness of grouping. Furthermore, if the description is good 
for a certain fineness of grouping, it is good for all rougher groupings. 
In statistical practice we attempt to get good fitting frequency functions 
and curves for such groupings as occur in important investigations. 
Again, it does not seem to hold that a curve fitting a small sample 
well, will necessarily be a good fit when the number in the sample is 
greatly increased. 

The question naturally arises as to the value of P at which we cease 
to call a fit good. It is impossible to fix such a value because the changes 
are gradual. If P = .01, the odds are nearly 99 to 1 against a ran- 
dom sample giving as great, or greater deviations. If P = .1, we antici- 
pate, in the long run, the assigned amount of deviation or more, one 
time in ten under random sampling. The fit should then surely not be 
called bad. But it seems undesirable to assign a value of P for which 
a result must be discarded. 


Example 6. The observed frequency distribution of the number of a-particles 
radiated from a disk in an experiment by Rutherford and Geiger ? and the correspond- 


1 Loc. cit., pp. 164-66. Loci cit., Dp. ale 
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ing theoretical values given by the Poisson exponential limit may be exhibited as 
follows: 


Number of a-particles radiated from 


a disk in one eighth minute. . | 9} 1] 2 |3/ 4 | 5) 6 | 7) 8 | 9 | 10 |11/12-14 
Observed frequency m’ . . . . | 57 | 203| 383 |525| 532 | 408| 273 | 1301 45 | 27/10/4| 2 
Theoretical frequency m. . . . 54 | 210) 407 |525) 508 | 394! 254 | 140] 68 | 29|/ 11/4] 1 
CASTS Vi Pe ee 3 |— 7/— 24, 0 | 24 | 14] 19 |—1;— 23/— 2/—1/0] 1 
(m’-m)t . .. 1. ~~. | 9 | 49] 576] 0 | 576 | 196] 361| 1 | 520) 4} 1 I10| 1 
oun or A ee aes .167|.233/1.415| 0 |1.134].497|1.421].007/7.779|.138].091| 0| 1 

x? = 13.882. 


Turning to Elderton’s tables (loc. cit., p. 27), for n’ = 13, we find 
P = 0.309. We conclude that, in the long run, approximately 31 cases 
per 100 trials in taking random samples will give deviations as great 
or greater than those observed. The inference would surely be that 
the fit of theory and observation is good in this case. 

As a precaution in applying the criterion of goodness of fit in numerical 
cases, it may be stated that small frequencies at the ends or margins of 
the distributions should be grouped together as illustrated in example 
6 in grouping together the frequencies at 12, 13, and 14. Moreover, 
the number of class frequencies n should not be very large; for in this 
case, the test becomes illusory. Fortunately, in practice n is not usually 
large. As another precaution of a general nature, it should be rec- 
ognized that the theory of the method is based on the assumption that 
the deviations in frequencies are a normally distributed system of 
correlated variables. 

The application of the criterion of goodness of fit has been extended 
to regression curves,! to cells in contingency tables ? and to other prob- 
lems of sampling. 


1, Slutsky, “On the Criterion of Goodness of Fit of Regression Lines and on 
the Best Method of Fitting Them to Data,’”’ Jour. of Roy. Stat. Soc., vol. 77 (1913), 
pp. 78-84. 

2. A. Fisher, “On the Interpretation of x? from Contingency Tables, and the 
Calculation of P,’’ Jour. Roy. Stat. Soc., vol. 85 (1922), pp. 87-94. 

G. Udny Yule, “On the Application of the x? Method to Association and Con- 
tingency Tables with Experimental Illustrations,” Jour. Roy. Stat. Soc., vol. 85 
(1922), pp. 95-104. 

Karl Pearson, “On the x? Test of Goodness of Fit,” Biometrika, vol. 14 (1922), 
pp. 186-91. This paper is a rejoinder to Fisher and Yule. 

“Further Note on the x? test of Goodness of Fit,’ Biometrika, vol. 14 (1923), 


p. 418. 


CHAPTER VI 


BERNOULLI, POISSON, AND LEXIS DISTRIBUTIONS 
By H. L. RIETZ 


TYPES OF DISPERSION OF STATISTICAL RATIOS 


Relative frequencies or statistical ratios. One of the simplest and 
most common problems of statistics consists in finding the ratio of the 
number of cases in which an event happens or in which a condition is 
fulfilled, to the total number of cases in question. Thus we find the 
ratio of the number of heads to the number of coins thrown; the ratio 
of the number of male children born to the total number born; the 
ratio of the number of deaths or accidents to the population exposed ; 
the ratio of the number of persons under 21 to the total population, 
and so on. Such statistical ratios or relative frequencies may be 
obtained for repeated trials in a game of chance, for populations of dif- 
ferent districts, or for the same districts at different times, and for a 
great variety of concrete situations. Experience with actual data shows 
that such ratios or relative frequencies obtained from different sets of 
trials exhibit dispersion. It is the purpose of this chapter to present 
methods for comparing the dispersion of statistical ratios with certain 
theoretical norms. The method of comparison involves the classifica- 
tion of a statistical series into sub-series for examination as to stability. 

The dispersion of relative frequencies or statistical ratios. Cer- 
tain criteria have been devised! for the comparison of dispersion of 
relative frequencies found from statistical data with the a prior’ most 
probable value? of the dispersion of certain norms derived from urn 
schemata. 

For example, we may compare the dispersion of sex ratios in various 
districts with the most probable dispersion of the ratio of the number 


1W. Lexis, Zur Theorie der Massenerscheinungen in der menschlichen Gesellschaft 
(1877), p. 22. “Uber die Theorie der Stabilitit statistischer Reihen,”’ Jahrbuch 
fur Nat. Ok. u, Statist., vol. 32 (1879), pp. 60-98. Abhandlungen zur Theorie der 
Bevilkerungs- und Moralstatistik, Kap. V-IX, 1903. 
? The expression “most probable value” means a value more probable than any 
other single value that can be named. 
82 
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of heads to the number of coins thrown where the number of coins in a 
set is equal to the number of births in a district. We may likewise com- 
pare the dispersion of death rates of various districts with the dispersion 
of the binomial distribution (p + q)*, where g is the mean value of the 
death rates and p = 1 — q. 

The distributions used as norms are sometimes called : 


(1) Bernoulli or binomial distributions. 
(2) Poisson distributions. 
(3) Lexis distributions. 


Bernoulli or binomial distribution.! When the probability of an 
event remains constant throughout any number of sets of drawings, 
each set consisting of s trials, the resulting distribution is called a Ber- 
noulli or binomial distribution. 

Thus an urn containing white and black balls is maintained so that 
in drawing a ball the probability of getting a white ball is p and that of 
getting a black ball is g = 1— p. Then the most probable value? of the 
arithmetic mean of the number of white balls among s balls taken at 
random is sp and the most probable value of the standard deviation of the 


relative frequency of white balls, = , 


o'3 = fi. (1) 
8 
That is, in drawing s balls one at a time, we get m; white and s — my, 
black balls, mz white and s — mz black, and so on. Then the a priori 
most probable value of the standard deviation of 


owe 
is given by (1). ‘ 
From (1), the most probable value of the standard deviation of the 
frequencies 7, m2, * » * would clearly be 


Cz = V spq. ; (2) 


Example 1. Conceive of drawing 7 balls one at a time from an urn with a con- 
stant probability 4 that a ball will be white. Continue the process by drawing a 
large number of such sets of 7 balls. Then from (2) 


op = ¥V7 = 1.323, (3) 
and the standard deviation of the relative frequency of white balls is 
on = 2gV7 = 0.189. (4) 


1G. Udny Yule, Introduction to the Theory of Statistics, Chap. 15. 
2 Cf. The Bernoulli Theorem, p. 16. 
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Experiment 1.1 A set of 7 coins were thrown 1536 times with the following distri- 
bution of heads per throw: 


7 
13 


6 
69 


3 
456 


1 
78 


2 
270 


0 
12 


Number of heads . 
Frequencies . 


4 5 
386 252 


From these frequencies the standard deviation of the number of heads per throw 
is found to be o = 1.302, and the standard deviation of the relative frequency is 


o = 0.186. (5) 

From (4) and (5), we note the closeness of agreement of theory and experiment. 

The point to be emphasized in this experiment is that the frequency distribution has 
at its foundation a constant probability. 


Poisson distribution.2 When the probability of an event varies from 
trial to trial within a set of s trials, but the several probabilities from one 
set of s trials are identical with those of every other set, we have a Poisson 
distribution. 

Thus let s urns 

Ui, U2, Us,-++ Us 
contain white and black balls in such relative numbers that 

Pi, P2, P3,°** Ds 

are the probabilities for the respective urns that a ball to be drawn will 
be white. Let 

Re PR Pee (6) 
Then sp is the most probable value of the arithmetic mean of the number 
of white balls in a set of s in taking one from each urn, and the standard 
deviation op of the number of white balls per set of s is related to the 
standard deviation 

op = Vspq 


of a hypothetical Bernoulli distribution, based on a constant probability 
p, by the equation 


op? = spqg— > (p, — p)*. (7) 
z=1 


Hence the standard deviation of a Poisson distribution is less than 
that of a Bernoulli distribution based on a probability p. 


¥s For other experiments which illustrate different types of dispersion, see Arne 
Fisher, Theory of Probabilities (1922), pp. 137-45; also J. M. Keynes, A Treatise on 
Probability (1921), pp. 361-66. 

2E. Czuber, Wahrscheinlichkeiterechung, II (third edition, 1921), pp. 38-40; also 
Arne Fisher, loc. cit. (1922), p. 121; C. V. L. Charlier, Vorlesungen tiber die Grundztige 
der mathematischen Statistik (1920), p. 30. 
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Ezample 2. Conceive of drawing one ball from each of 7 urns which are main- 
tained so that 4, 4, +, 3, 5, 3, ¢ are the respective probabilities that the balls will 
be white. Make a record of the number of white balls in the set of 7. Then repeat 
the process until we know the number of white balls in each of a large number of such 
sets of 7. Then from (6), we have 


From (7) i 
op? = 44 = 1.556. (8) 
op = 1.247. _ (9) 


Then the most probable value of the standard deviation o’p of the relative frequency 
of white balls is 
op = a = 0.178. (10) 


Experiment 2. Urn schemata were devised in which each urn contained 12 balls, 
and in which the number of white balls in the respective urns was 3, 4, 5, 6, 7, 8, 9. 
A set of 7 balls was obtained by drawing one ball from each urn. The number of 
white balls in the set of 7 was recorded. Then each ball was returned to the urn 
from which it was drawn, and a second set of 7 was drawn. This process was con- 
tinued until we had 480 sets of 7. The following frequency distribution of white 
balls was obtained from these 480 sets: 


Number of white balls . 
Frequency . 5 


0 1 2 3 4 5 6 7 
3 17 75 123 158 83 19 2 
From these figures the standard deviation of the number of balls per 


set is found to be o = 1.216, and the standard deviation of the relative 


frequency of heads is 


o’ = 1:216 _ 9.174 + 0.004. (11) 


7 

Note from (10) and (11) the closeness of agreement of theory and 
experiment. The point to be emphasized is that the frequency dis- 
tribution of this experiment has at its foundation unequal probabili- 
ties within each set of 7, but one set of 7 has at its foundation the same 
probabilities as any other set of 7. 

Lexis distribution. A Lexis distribution is obtained when the proba- 
bility of an event is constant from trial to trial within a set, but varies 
from set to set. Thus, we may draw s balls one at a time from an urn 
U; with constant probability p: of getting a white ball, from U2 with a 
constant probability p2--++, from U, with constant probability p,. 

LG es 9d Son wee ees 9 2 

Let p= ME Pe Ps, 
where we have n sets of s instances each. 

When extended to the n sets of s balls each, the most probable value 
of the arithmetic mean of the number of white balls, in sets of s balls 
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taken at random, is sp, and the most probable value of the standard 
deviation of the n numbers giving the white balls in the sets of s is re- 
lated to the standard deviation Vspq of the hypothetical Bernoulli 
distribution, based on the probability p, by the equation 


a7? = spg+ o— 5 (pz — p)? = spq +(s? — 8)oy,, (12) 


where o,, is the standard deviation of pi, p2,*** Dn. 

Hence the standard deviation of a Lexis distribution is greater than 
that of a Bernoulli distribution based on a constant probability p. In 
fact, the o,? may be regarded as the sum of two parts. The first part 
spq would be present even if the underlying probability were a con- 
stant p. Lexis calls this the ordinary or unessential SRS The 
other part (s? — s)c»,”, he terms the physical component. 


Example 8. Consider 9 urns which are so maintained that 4, 4, 4, x5, 4, va, 
3, § are the respective probabilities that a ball taken at random from an urn will be 
white. Draw 7 balls at random from each urn. 

Then the arithmetic mean of the probabilities is p = 3, and the most probable 
value of the standard deviation of the number of white balls in a set of 7 is given by 


o, in (12). Thus 
Cb Rte A? = 1. 922, 


and the most probable value of the standard deviation of the relative frequency of 
white balls in a set of 7 is 
of, alk = 0.275. (13) 
Experiment 8. Urn schemata were arranged in which each of 9 urns contained 
12 balls, and the number of white balls in the respective urns was 2, 3, 4, 5, 6, 7, 8, 
9,10. Then 7 balls were drawn one at a time with replacements from each of these 
urns and a record was made of the number of white balls in each set of 7. This proc- 
ess was repeated 67 times to give large numbers and thus to reduce probable error 
in our calculation of the standard deviation in the number of white balls in sets of 7. 
The resulting distribution of white balls in a set of 7 was, for the 603 sets of 7, as fol- 
lows: 


Number of white balls 


7 6 5 4 3 2 LE Teo 
Frequencies 32 78 101 97 


From these figures, we find the standard deviation 
o = 1.906 + .037, 
and the standard deviation of the relative frequency of balls is 


o’ = ae = 0.272. (14) 


From (13) and (14) we note there is a difference of .003 between the experimental 
result and the @ priori most probable value. This difference may reasonably be 
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attributed to chance fluctuations. The point to be emphasized in this experiment 
is that the frequency distribution has at its foundation a constant probability within 
each set of 7 but unequal probabilities from one set of 7 to another. 

This experiment may also be regarded as consisting of 9 sets of 469 instances each, 
where the drawing of each ball is regarded as an instance. In this case 


» 469 , (469)(468) 5 _ 
opt = $88 4 69)(888) | 5 = 10,279. 
oc, = 101.4. 


From the experiment we have the following numbers of balls in 9 sets of 469 balls 
each: 
79, 125, 170, 187, 234, 298, 298, 349, 399. 


The standard deviation of these numbers is = 100.4, which agrees well with o, 
= 101.4 when we allow for chance fluctuations. 


THE CRITERIA OF LEXIS AND CHARLIER 


The Lexis ratio. Let o’ be the standard deviation of a series of rel- 
ative frequencies obtained by experiment from statistical data when 
the probability p is known. Next, find the dispersion 


Pq 


Re Bs 
ORG fae: ek 
s 


on the hypothesis that we have a Bernoulli distribution. 


, 
The ratio L = -“- = — is called the Lexis ratio, where 
op OBZ 
o=8o' and go, = 80'5. 
When L! = 1, the distribution is said to have normal dispersion. 
When L <1, the distribution is said to have subnormal dispersion. 
When L>1, the distribution is said to have supernormal dispersion. © 
Thus, in experiment 1, we find from (4) and (5), 


_ 0.186 _ 
Es 0.189 ses 


and the dispersion seems to be approximately normal. 
In experiment 2, we find from (4) and (11), 
0.174 me 
REY 0.921, 
and the dispersion seems to be slightly subnormal. 
In experiment 3, we find from (4) and (14), 
erozra 
0.189 pie 
and the dispersion is supernormal. 


1 This equality means L = 1 except for chance fluctuations. 
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Segal 476 
The probable error of Z in these experiments is given by + el OO 


vs 
but this does not hold when p is estimated from the sample. 


The Charlier coefficient of disturbancy. For a Lexis distribution, we 
have from (2) and (12) the relation 


o,? =o," + (8 — 8)o5,”, (15) 

where o,, is the standard deviation of the probabilities from set to set. 

From (15), 77 = SUACE : (16) 
#—s 


- Based on ¢p, as a measure of variability of probabilities, we may in- 


troduce a coefficient of variability 
Tp, 
—— 3 
P 
where 7p is the arithmetic mean of the probabilities. 


From (16), we have 


op, az a7? — oz? 
Dens) 
pe 2 

= OL = (17) 


approximately, since we may for the present purpose neglect s in com- 
parison with s?. 


Then = = Mt ue ; (18) 


where M = sp is the most probable value of the mean number of hap- 
penings in s trials, 

If o, in (18) is replaced by the actual standard deviation o of any 
given statistical distribution, we have the Charlier coefficient of dis- 
turbancy : 

100 p = 100¥" — 2", (19) 
M 

The Charlier coefficient 100 is obviously zero for a Bernoulli distri- 
bution except for fluctuations in sampling. It is positive for super- 
normal distributions, and imaginary for subnormal distributions. 


APPLICATIONS TO STATISTICAL DATA 


Application of the Lexis and Charlier criteria to infantile mortality 
rates of various districts. As an application of the above criteria, let 
us consider a set of infantile mortality rates for various States. In the 
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construction of Poisson and Lexis distributions from urn schemata, 
it is assumed that the number of instances in each set is a constant s. 
Experiments with urn schemata can be controlled so that this condition 
is fulfilled, but it is not likely to be fulfilled exactly when we are collect- 
ing actual data of the number of births and deaths. The best we can 
do is to subdivide the total into districts of as nearly equal numbers of 
births as is practicable, or to weight the rates from different districts 
in finding means and standard deviations. 

Let us consider the dispersion of death rates of white infants under 
one year of age in the registration of States of the United States.! 

Before subjecting the data as a whole to analysis, let us consider the 
death rates of white infants under one year of age in those States in 
which the number of births of white children is between 33,000 and 
58,000. This selection is made so that the number of instances per set 
has only a moderate amount of variability. 


Pee Brrtxs van 1000 
G@altorninae sucess esac: col cills ment OU, 10¢ sues Mee edema 10 
GMONNECUICUL MEE EPO Cae ee ci cme nee p TOO sO1 Ul ha een was em Oo 


NCSI AME ES sac er eset es esis OL{OLD Velie s Riete te acl co 
KENSS EE sum cote Mette) eet otisa cl ce OO;OUSE! oe. es Mts) a. rent OS 
ESCNGUCK VamM etn Title Bate sels MEeTOO;OO0S witeay e) ek at feist Cd, 
IMinineROLatIS © ct ani or dL sie sal wench uept OL, 4027 ers. yh. waren ae Rtn emerOO 
North Caroling Mare asnirees euros woul t OLSGOSe sir as ols of relia rs mays 
WAN CIA Ms are ie ee sie caCeMs this Ars ALSODOS oles ew ts tier wee S 
Wisconsin: 6 tie 6 6 6 6 we GEAIZ ee et we we a TD 
LOCALE Mile shill s tbe se cle, vr 400,404 9)675 
AM = 47,828 75 


First Method. The simple AM of the death rates is 75 per thousand, and their 


standard deviation (without weighting) is 5.72 per thousand. 
If these infantile death rates constituted a Bernoulli distribution with a number 
of instances equal to the average population, 47,828 in each case, we should have 


4 os Pq = (.075) (.925) = .00120 per person 
eeeN 47,828 oie y 
= 1.20 per thousand. 


Hence the Lexis ratio is 


5.72 
me = = 4.77, 
# 1.20 


The Charlier coefficient of disturbancy is 
_ 100V (5.72)? — (1.20)? _ 7 45, 
75 
Hence the dispersion is supernormal, and the inference is that there is a significant 
variation in infantile mortality among these States. The interpretation of the result 
would no doubt involve a consideration of the factors which produce the variation. 


1 Birth Statistics for the Registration Area of the United States (1919), p. 37. 


100 p 
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Second Method. By weighting the rates with the number of children exposed, we 
may find the mean death rate per child or the ratio of the total number of deaths 
(as computed from the death rates) to the total infantile population exposed. Then 
the rate per thousand children is, for the above illustration, 

1000 ¢ = (50,707) (70) + (33,370) (85) + (57,915) (78) + - + + + (54,472) (79) 
30,454 
= 74.864, or .07486 per child. 


Baier we may find the standard deviation ¢, where 
_ 50707(70 — 75)? + 33370(85 — 75)? + + + +54472(79 — 75)? —(75 — 74,864)!. 
430,454 


This gives ¢ = 5.40 per thousand. 


Based on this method ¢’s = a2 = A j{:07486)(-02514) nae yee = 1.203. 


Hence, by this method, the Lexis ratio is L = 58 = 4.50. 

It will be noted that the results given by the first method differ somewhat from 
those given by the second method; but for the usual purpose of drawing inferences 
about the operation of certain disturbing influences, the two methods would almost 
surely lead to the same conclusions.! 

Let us next extend our illustration to include all the States included in the Birth 
Statistics for the Registration Area of the United States (1919), p. 37. 


DeatTHs UNDER ONE 
SrarTes BirtTHS YEAR oF AGE PER 
1000 Birtss 


Californian oo 7 weet se, ema er clare a O0, COCaKo us Men cmtcen area sO 
Connecticut}: secret benee mre eacvEn O;O (Eo wn cal tel aes eer S 5 
Indians eV spss Rotdsete ods Pa co, OL; DLO Mee Mester teh ie Ee S 
WKANSRAS hs rene tua be eeMestes 4a OD;G02 aie” Tae cure eee GS 
Kenticky: 25.030) < 020 Bemeeete ‘eam O0;GOS list es), sev eEne eat 
Maino bate sarecietecuee Pak car sie LO: 4/ON Al. ee ae ar een Nh 
Maryland cum ube tes ete ck ical sue aie (G440aNs ene ek rie ae me OD 
Massachusetts: oa). uveuien ne, Pewee OU, OOGmt cir siet.i teu san, Aen aS 7; 
Michigan a6 tego ate suites 6. cul ss ks MEh Oc CLO Rca us rete ate ee OO 
Minnesota's-u okelrs Eictihe heute OL 402. me, ue erm tee ee een OG 
New Hampshire. ...... So; VOZMNsmnc ur cnt omt ame OS 
IER Cid are ae ee oe er ehiack PPILGSON 6. & ay tk Hel one 4e2- 
North: Carolina eye. tessa ever el Soe mre cena ua sn nen 7 A. 
OOF Grew st. es, te, (ode bs ns ot LOD, COD. ke cat ae ES 
Oregoner, sn. to) “ste ltedketits tage O32 LOE ane cant ee OT 
Pennsylvania t. 5. "s.'« 4 he 20000 ee as, cee es ee Oe 
South Carolina” <5. 7.0 ce  vecnee OlO reece on ee ae eG 
Utah: cet tisuite” 5. 0 Keuil'es te aretuhe Slo SOO. eamer gin) Boer aan meme 
Vermontay cic «ulcer een coe 7,029 . 


: z £ e e e e . 86 
Virginia Gece oro: “eahenak Alle Ae aoe oy a1 4st 
Washington e e e e e e e 23,785 e e e e e ° e 62 


Wisconsin fs) o. ey. at. eu clos pee OS Te Sis ait han er tieg, 


‘For somewhat different methods and for further illustrations, see Mathematical 
Theory. of Probabilities, by Arne Fisher (1922), pp. 149-56. 
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First Method. Arithmetic mean number of births = 57,430. 
The simple mean death rate per thousand = 1000 q = 79.545. 
The simple standard deviation = 10.24 per thousand, and 


n= Ne = .001129 or 1.129 per thousand. 
The Lexis ratio is 


Second Method. Mean death rate per thousand = 83.049. 
Standard deviation = 9.664 per thousand. 


a7 2- af NSE) .001151 per person or 


Feet 1.151 per thousand. 
: ap 9.664 
Th t = Saenee 40. 
e Lexis ratio is L 161 = §. 


It should be noted that the resulting Lexis ratios differ somewhat 
by the two methods. However, here again for the usual purpose of 
passing judgment on the extent of disturbing influences producing de- 
partures from a constant probability of death, the conclusions drawn 
from the two methods would probably not differ significantly. 

The next step in the analysis of the given death rates would very 
naturally be to consider various subsets of States with a view to finding 
the largest group which may reasonably be regarded as having at its 
foundation a constant probability of death. 


CHAPTER VII 


FREQUENCY CURVES 
By H. C. CARVER 


MODIFICATION OF FREQUENCY MOMENTS 


Tue method usually employed in fitting an arbitrarily chosen fre- 
quency function to an observed distribution is the method of moments 
(see Chapter IV). The process depends on obtaining expressions for 
as many “ moments” of the frequency function as there are parameters 
in the function and assuming that these functional moments may be 
equated to the numerical moments as computed from the observed dis- 
tribution. A solution of the resulting equations, theoretically possible, 
gives the parameters. 

In graduating distributions we deal with 

(1) observed distributions of either a continuous or discrete variable, 
and 

(2) certain functions of either a continuous or discrete variable. 

Five cases may arise. 


Case I. Inmany graduations, the observed distributions are of contin- 
uous variables and the functions employed are functions of a contin- 
uous variable. Frequencies are represented by areas. (See page 22.) 

The functional moments obtained by direct integration are 


f xydz 
f ydx 


and should be equated to numerical moments of like kind computed from 
the observed data. But such moments could be computed from the 
data of the primary series only, since the process of classification in the 
case of a distribution of a continuous variable gives no clue as to the 
manner in which the items belonging to the various classes are dis- 
tributed between their respective class limits. 

If we choose the class interval as the unit of x and let x; be the mid- 
abscissa of the ‘th class, then the numerical moments as approximately 
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and directly computed from the observed distribution are in reality de- 
fined by 

Po S 7 f 3 
Nv'n 2) a (Sf + ah}, 
where N is the total frequency, while the functional moments are 
Ny’, = if x"f(x)dz. 


The relations between y’, and v’,, known as Sheppard’s adjustments, 
may be developed as follows: 


If f(x) is a continuous function which can be expanded by Taylor’s 
theorem, 


flnth => EI°@), 


then 
[is +h)dh = >| Lata 72°(2), 
and therefore i 
Nv’, =>" { Sapa | 
Paaaperred Pra f(x.) 


If the distribution is such that the theoretical frequency function and 
all its derivatives vanish at + oo, then by the Maclaurin Sum formula, 


he» Sp i Ral n, fii) 
Nv'n Saari | 7 ti de | 


a.) 


By successive integration by parts 


f xf?) (2)da = n(n <a 1) see (n =— Jai 1) xf (x) dx 
7 = N20 + Cay prvi. 
Therefore 


=p, + pa pone + oC Sa ie (1) 


Taking the mean as origin, adjusted moments to as high an order as 
necessary should be computed from the first set of relations as given be- 
low, and the last moment of highest order checked by the proper equa- 
tion of the second set. Such procedure automatically checks all adjusted 
moments of inferior order. 
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From (1), by dropping primes to denote moments about the mean, 


po = 2 — Ys 


Bs = V3 
M4 = v4 — $2 — Bo 
Ps = V5 — § Bs (2, a) 


Pe = Ve — $e — Behe — tis 
My = V7 — Fs — Te Bs 

ee = pabe 1 
Pa = ve — $46 — GMs — Te M2 — T50T 
Dw AP Os 95 


3 = V3 
fg =u —tve+ gio 
Ps = V5 — $3 (2, b) 


Pe =v — eu tiyev — rit 

bi = V7 — $y + £3 v3 

Pa = Vg — Suet dtu — Bove t+ sere 
and so on. 

It should be noted that Sheppard’s corrections are derived on the as- 
sumption that the derivatives of the unknown frequency function van- 
ish when s = +0. This does not mean that the derivatives of an ar- 
bitrarily chosen graduating function must necessarily vanish at its in- 
termediate range limits. The modification of the observed moments is 
dependent only on assumptions concerning the vanishing of derivatives 
of the unknown law of distribution, and these moments are equated to 
functional moments obtained by integration of a function which con- 
tains the graduating function as a factor. The vanishing of derivatives 
of the graduating function is a detail in finding functional moments, 
and as such is outside the domain of this chapter. 


Case II. In graduating a distribution of a continuous variable by a 
function of a discrete variable (for example, the point binomial, hyper- 
geometric series, etc.), frequencies should be represented by areas, but 
ordinates of the function can be computed at certain discrete points 
only. The frequency areas must be obtained from these ordinates by 
the use of quadrature formulas. 


The computed moments 
iJ 


8 ates n a 
Nv n Sa | fife: -- hah | 
should be adjusted so that they may be equated to the functional 
moments which are 


oO 


> a" f(x). 


t=-@ 
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_ But since, subject to the usual restrictions, the theoretical law of 
distribution is continuous, and the function vanishes together with all 
its derivatives at + 0, we have 


S arse) = fx y(e)ae. 


t=—@ 
Therefore the adjustments for Case II are exactly the same as those 
in Case I. 


Case III, If a distribution of a discrete variable be graduated by a 
function of a continuous variable, and that function, together with all 
its derivatives, vanishes at the limits of its range, then no adjustments 
are necessary. This follows from the Maclaurin Sum formula. 


Case IV. If a distribution of a discrete variable be graduated by a 
function of a discrete variable, the numerical and functional moments 
are at once of the same form, and no adjustments are necessary pro- 
vided discrete classes are not themselves grouped together to form a 
new distribution containing fewer classes. 


Case V. It should be noted that for a distribution of a continuous 
variable the graduated frequencies should be obtained by computing 
areas by either direct integration or quadrature formule. It is some- 
times expedient to regard the frequencies of distributions of continuous 
variables as though the frequencies were associated with discrete vari- 
ables. On this assumption no modification of moments would be called 
for (see Cases III and IV) and the ordinates of the graduating function 
may be taken as representing the graduated frequencies. 

Such procedure must be regarded as theoretically unsound, since 
frequencies of a distribution of continuous variables should be repre- 
sented by areas. Practically, the results obtained will differ but little 
by the two methods except in the case where class frequencies lie in the 
abrupt region of extremely skew distributions. 


PROBABLE ERRORS OF FREQUENCY MOMENTS 


This subject, which is of prime importance to an understanding of the 
theory of frequency distributions, has been treated in considerable de- 
tail in papers! by Pearson, Filon, and Sheppard. 


1 W. F. Sheppard, “On the Application of the Theory of Error to Cases of Normal 
Distribution and Normal Correlation,” Phil. Trans. A., vol. 192 (1898), pp. 101-67. 

K. Pearson and L. N. G. Filon, ‘“‘On the Probable Errors of Frequency Constants 
and on the Influence of Random Selection on Variation and Correlation,” Phil. 
Trans. A., vol. 191 (1898), pp. 229-311. 

K. Pearson, ‘On the Probable Errors of Frequency Constants,” Biometrika, vol. 
2 (1903), pp. 273-81. 
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In our treatment of frequency distributions we shall have occasion to 
deal with the following formulas relating to probable errors. 

(a) The standard deviation of errors in the sth moment taken about 
the mean is 


m4 [Bee Me Sposa? — 2 She—aett, 
ee aa 


z | og — ee? 8200?,_ 1) — 2 Soeg_10 1 . O64? (eg — 1) — S0¢5( Og 42 — Hg — SO%g(ty_1) . 
n 3 


(c) If i= Se then Bi = a2, 09, = 2asray 
and from (6), 


wp, = %A~ {Ales — 6 ty +9) + (35 ey +9 eye — 12 05)}. 


(d) If B= i i.e. Bo = 4, 
2 
then from (b) 


on = Nee Ht 0%4(4 oq? — oy — 405) + 8 3(Qa3 + Zaza, — as) }, 


where the probable error in any statistical result Z may be defined as 
equal to 0.6745 oz (cf. p. 76). 


THE NORMAL CURVE 


Referred to the mean of a distribution as origin, the equation of the 
normal curve is Nia ee 
= e 20 


oV2i«r 


where WN and o represent the total frequency and the standard devia- 
tion of the distribution (cf. p. 13). This curve is symmetrical about 
the centroidal axis x = 0, and y is a maximum at x = 0; consequently 
the mean, median, and mode coincide. 

This equation has been developed from various hypotheses. Hagen’s 
demonstration | is based on the hypothesis that any observed variation 
from the mean is the algebraic result of an indefinitely large number of 
elementary variations which are of equal infinitesimal magnitude and 
each of which is equally likely to be positive or negative. 


1 Mansfield Merriman, Method of Least Squares (1910), p. 17. 
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Czuber’s Wahrscheinlichkeitsrechnung ' gives a very careful develop- 
ment which follows essentially the memoirs of M. W. Crofton.? 

It is desirable to measure deviations from the mean in units of the 
standard deviation of the distribution in order that the distribution it- 
self may be analyzed quite independently of the unit of measurement 
involved. 

Placing, therefore, 


t= s 
the equation of the normal curve reduces to 
fl! a! Po; 
where . 
oy = ee 


It follows, therefore, that. if the ordinates of the frequency polygon as- 
sociated with any observed distribution be multiplied by iv and f(t) be 


defined by f(t) = af = xs), the plotted points (t, f,) will fall ap- 


proximately on the graph of ¢) provided the distribution be nearly 
“normal.” The distribution (t, f+) is referred to asa “ standard dis- 
tribution.” 

Table I exhibits the reduction of two apparently symmetrical distri- 
butions to their standard distributions. The representation of (¢, f:) as 
coordinates of points in a plane and the corresponding normal curves 
¢(t) would indicate at a glance that each of the curves closely follows 
the normal law, although it would be difficult to see which of the dis- 
tributions more nearly approximates the law. For more exact criteria 
we may proceed as follows: 


For the normal curve a 
1 wo 


AL DSS oV20 eye << 
a, = oe = = 
= fo [Meta 1-3-5---@e—1) aan 
V20 J co 2*|8 
Since ¢ is symmetrical, that is, an even function of £, 
Gor = 0. 
1 Ableitung des Fehlergesetzes aus der Hypothese der Elementarfehler, vol. 1 (1914), 


p. 291. ; 
2 Phil. Trans. (1868), and “Probability,” Encyclopedia Britannica (1885), vol. 19. 
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Taste I 
Distrisution I Disrrisvrion II 
x t t te 
— 10 — 3.8957 4 — 3.6803 .0023 
—- 9 — 3.5069 A — 3.3149 -0036 
-— 8 — 3.1181 : — 2.9494 -0090 
— 7 — 2.7294 ‘ — 2.5840 .0164 
-—- 6 — 2.3406 d — 2.2186 .0312 
—- 6 — 1.9519 ‘ — 1.8531 -0614 
- 4 — 1.5631 3 — 1.4877 .1291 
-— 3 — 1.1744 ‘ — 1.1222 .2002 
- 2 — .7856 — .7568 .3063 
- 1 — .3969 : — .3914 .3693 
0 — .0081 . — .0259 4051 
1 .8806 z .3395 .38826 
2 -7694 : -7050 .3176 
3 1.1582 ‘ 1.0704 .2271 
4 1.5469 é 1.4358 -1452 
5 1.9357 4A 1.8013 .0724 
6 2.3244 A 2.1667 .0345 
7 2.7132 4 2.53822 .0136 
8 3.1019 A 2.8976 .0059 
9 3.4907 4 3.2630 -0013 
10 3.8794 A 3.6285 .0020 
11 3.9939 0003 
12 4.3594 0003 
Disrrisution I Disrrisvution II 
2fe 8585 10701 
Saf. 179 759 
r2f, 56809 80183 
b -02085032 ! .070927951 
v's 6.6172394 7.4930380 
v2 6.6168047 7.4880072 
g 2.5723150 2.7364223 
1/o .38875488 .386544067 
o/N -00029962900 -00025571650 


eeaaae—eeeaeaeaeaeaeaeaeaq=®=«=Qooomomooo——— OOOO 
‘For the meaning of b, see page 24, 
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From (6), page 96, we may therefore obtain the following values of a, 
together with their probable errors: 


TasB_e IJ. Criteria ror THE Norma Curve or Error 


8 a, PROBABLE ERROR OF 
3 0 | .67449V6/n =  1.652/Vn 
4 3 | .67449V24/n = 3.304/Vn 
5 0 | .67449V’720/n = 18.10/Vn 
6 15 | .67449/6120/n = §2.77/Vn 
7 0 | .67449V124110/n = 287.6/Vn 
8 105 .67449V 1663200/n = 869.9/Vn 
For distributions I and II we compute: 
DistTrRisutTion I Disrrisution II 


8 Bs Qs Bs Qs 
6.5334714 1.0000 7.4046739 1.0000 
— .20783947 — .012445491 — 1.4969639 — .074293774 
134.40994 3.1487878 179.68978 3.2772645 
— 9,5158173 — .087213982 — 107.92005 — .72333137 


4828.4828 17.313254 7914.1779 19.493418 
— 369.01209 — .51765046 — 6056.2890 — 5.4819595 
243628.25 133.70623 400330.38 133.16644 


CONTR Or Gb 


On the hypothesis that these distributions are normal, the difference 
between the actual and expected values of a, may be compared with 
the probable error of «, as follows: 


DisrrisvTion I Disrrisvtion II 
(1) (1) (2) 
8 Actual —- Ny — Actual — P.B.ofa 
Expected (1) + (2) Expected ‘hae Ses 
3 — .012445 4 .698 — .072938 .01597 
4 -14879 4 4.17 .27726 .03194 
5 — .087214 : 447 — .72333 .1750 
6 2.3133 : 4,06 4.4934 .5101 
7 — .51765 .202 — 5.4820 2.297 
8 28.706 3.06 28.166 8.409 


100 
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Since, according to the normal hypothesis, the odds in favor of a 
variable lying between 


+9 P, 


Se ef id dt i 


~ 1 


21 

142 

1310 
19200 
420000 
17000000 


. =1000000000 


to 1 


4.5 to 1 


to 1 
to 1 
to l 
to 1 
to l 
to l 
to 1 


it is evident that the hypothesis that distribution II is normal is un- 


tenable. The first distribution is more nearly normal and certainly ex- 
hibits no significant skewness. 
Taste III 
t 
2 Mip-Crase 6% Gace Aes 
— 12 — 4.6732 .00001 
— 11 — 4.2844 -00004 
— 10 — 3.8957 -00020 1 2 
- 9 — 3.5069 -00085 3 4 
- 8 — 3.1181 .00309 10 14 
- 7 — 2.7294 -00963 32 41 
— 6 — 2.3406 .02578 86 83 
— 5 — 1.9519 -05937 198 169 
— 4 — 1.5631 .11759 392 394 
- 3 — 1.1744 -20018 668 669 
— 2 — .7856 29302 978 990 
- 1 — .3969 36872 1231 1223 
0 — .0081 39892 1331 1329 
1 .3806 .37106 1238 1230 
2 .7694 .29673 990 1063 
3 1.1582 -20400 681 646 
4 1.5469 12059 403 1} 392 
5 1.9357 -06128 205 202 
6 2.3244 .02678 89 79 
7 2.7132 -01005 34 32 
8 3.1019 -00325 11 16 
9 3.4907 -00090 3 5 
10 3.8794 .00021 1 2 
11 4.2682 .00004 
12 4.6570 .00001 
Total 2.57230 8585 8585 


aCQqQqQQqQQQQQQauamanamaumd290meeeeeeeeeeeee eS 


1 Actual value 402.46, nearest integral value 402. 
to 8584, instead of 8585, due to the fact that nearest integral values have been used in all instances, 


such cases we modify that frequency which is nearest the desired boundary, 


If 402 be used, the total frequency would amount 


In 
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The statement that the higher moments, because of their large prob- 
able errors, are idle, is frequently made but not always true. The ratio 
of a variation to its probable error is the deciding factor, and in the 
above illustration the higher moments reflect information quite as sig- 
nificant as that presented by the moments of lower order. 

With the aid of the table, page 209, distribution I, treated as a normal 
distribution of a discrete variable, may be graduated as in Table III 


by using ys 
c= v2? 


as explained in Case ITI of the discussion of the modification of frequency 
moments. 
Regarded as a distribution of a continuous variable (Case I) 


o = Van = Vin — oy 
and one may proceed as in Table IV. 
Tasie IV 


Mip-Cxass Lower Limits oF Frequencims 


INTERVAL Cuass INTERVAL t tee 
fi dt | Frequencies PA To eet 
Zz x t Graduated graduated 
— 12 — 12.5 — 4.8985 -00000 
-11 — 11.5 — 4.5073 -00000 -00002 
-— 10 — 10.5 — 4.1160 .00002 -00008 1 2 
-— 9 — 9.5 — 3.7248 .00019 -00033 3 4 
— 8 — 8.5 — 3.3336 .00043 .00120 10 14 
— 7 — 7.5 — 2.9424 .00163 .00374 82 41 
-— 6 — 6.5 — 2.5511 .00537 -01002 86 83 
— § — 5.5 — 2.1599 .01539 .02308 198 169 
— 4 — 4.5 — 1.7687 .03847 .04572 393 394 
— 3 — 3.5 — 1.3774 .08419 .07783 668 669 
— 2 — 2.5 — .9862 .16202 .11390 978 990 
- 1 — 1.5 — .5950 .27592 .14333 1230 1223 
0 — 4.6 — .2038 -41925 .15512 1332 1329 
1 5 1875 .57437 .14423 1238 1230 
2 1.5 .5787 -71860 11535 990 1063 
3 2.5 -9699 .83395 .07931 681 646 
4 3.5 1.3611 -91326 .04689 403 392 
5 4.5 1.7524 .96015 .02381 204 202 
6 5.5 2.1436 -98396 .01042 89 79 
7 6.5 2.5348 -99438 .00391 34 32 
8 7.5 2.9260 .99829 .00125 11 16 
9 8.5 3.3173 .99954 -00036 3 5 
10 9.5 8.7085 .99990 .00008 1 2 
11 10.5 4.0997 .99998 -00002 
12 11.5 4.4909 1.00000 
Total 1.00000 8585 8585 
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HEIGHT, 


TaBLE V 


GRADUATED AS DISTRIBUTION OF A 


WITHOUT UNGRADUATED 
bis dew ee Continuous Discrete 
Variable Variable 
55 — -02 .02 
56 — 14 14 
57 — 2 .67 .67 
58 — 4 2.84 2.84 
59 — 14 10.30 10.30 
60 — 41 32.11 32.11 
61 — 83 86.03 86.03 
62 — 169 198.18 198.17 
63 — 394 392.44 392.43 
64 — 669 668.13 668.11 
65 — 990 977.93 977.92 
66 — 1223 1230.60 1230.63 
67 — 1329 1331.38 1331.41 
68 — 1230 1238.38 1238.41 
69 — 1063 990.33 990.33 
70 — 646 680.88 680.86 
71— 392 402.46 402.44 
72 — 202 204.52 204.51 
73 — 79 89.35 89.35 
74 — 32 33.56 33.56 
75 — 16 10.83 10.84 
76 — 5 3.01 3.01 
T7— 2 V3 72 
73 — 15 15 
7 — .03 -03 
Total 8585 8584.99 8584.99 


In order to illustrate the discussion of Case V, Table V is presented, 
showing the results which would have been obtained from Tables III 
and IV had more extensive tables been employed. Actually the dis- 
tribution is one of a continuous variable, but the results obtained by 
regarding it as one of a discrete variable, and consequently omitting the 
modification of moments, differ but very little from the results obtained 
by properly treating it as a distribution of a continuous variable. 

It should be borne in mind that according to the criteria it is doubtful 
whether or not this distribution is normal. Distributions satisfying the 
criteria are rare. But if the ratio of the difference between the actual 
and expected values to the probable error is not greater than five or six, 
the results of the graduation are in general sufficiently satisfactory to 
justify the procedure on practical, if not on theoretical, grounds. 
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After completing the graduation of the distribution by means of the 
normal curve, the Pearson x?-test of goodness of fit (see p. 78), gives 


P = .268, 


if we use for the theoretical frequencies the nearest integers to the 
graduated frequencies shown in Table V. This value of P is somewhat 
larger than would probably be obtained if frequencies near the end of 
the range were grouped together (see p. 81). 

In order to provide for situations where the use of normal curve is 
not permissible, an entirely different function or a more general law 
must be employed. 


PEARSON’S GENERALIZED FREQUENCY CURVES 


Certain geometrical properties of unimodal frequency distributions 
suggest that any associated frequency function may be represented as 
a solution of the differential equation 


dy _ y(a—t) 
Seria (1) 
since 

(a) if there be one mode only, there must be a value of ¢ = a for 
which the derivative vanishes, and 

(b) towards the extremes, as y approaches zero, the derivative must 
also approach zero. 

At present we place no restriction on f(t) except that we assume that 
it may be expanded in a converging power series. Equation (1) may 
then be written as 

Lave a—t 
ty a, (2) 
y at bo + Bit + bot? + °°> 
' where the mean of the distribution is taken as origin, and the abscisse 
are measured in units of the standard deviation, as in the discussion of 
the normal curve of error. 

Clearing (2) of fractions, multiplying through by ¢", and integrating 

over the range r to s with respect to ¢ gives 


[2 f t"ydt — bp f indy — by f irHdy — ++. — f intydt =0. (3) 


[f ray] = [ery — nf eiyat] 


But 
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and if the frequency function, when multiplied by ¢*, vanishes at the 
limits of the distribution, r and s, we have that 


s 5 
f itdy = — nNay_1, (4) 
t=7 
where «, is defined on page 96. 
Giving n successively the values 0, 1, 2, -+-, we obtain from (3) and 
and (4), noting that N, the total frequency, cancels out and that @ = 1, 
Qg= 0, a, = Ae 


a aL b; +.--=0 
bo a 3b2+°++-=1 
a + 3 by + 4 asbo +--+ = a (5) 


aa + 3b + 4 asd; + 5 asbo tess = a 
a,0 =. NOn—1D9 = (n -}- 1)a,b1 ae (n + 2) On41 be + RG er | 
If we assume that.f(¢) converges so rapidly that terms involving the 


third and higher powers of t may be neglected, a simultaneous solution 
of (5) yields 


( ) es Fever oes by = en areas 
6) 2(1+ 26)’ 2(1 + 26)’ 
ics [aoe ies 
2(1 + 265)’ 2(1-+ 26)’ 
where fe Ora 
4 + 3 
and the moments are defined by the recurring relation 
rissa oer Ware EU IY [(2 + 8) ona + Gattn]. (7) 
The value of Cy poeta 3 
2(1 + 25) 


which represents the distance between the mean and mode expressed 
in standard units, is called by Pearson the ‘“‘ skewness ”’ of the distribu- 
tion. 

If a3 = 6 = 0, then by (6) the differential equation 


Lyfe Ooty 

ydt by + bit + be? 
reduces to 

ldy__, 

y dt ‘ 


which on integration yields the normal curve of error, 


e 
Se yoe 2, 
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which has already been discussed. Pearson refers to this curve as Type 
VII, a special case of his generalized frequency curves. 
If 6 = 0 but «3 + 0, we obtain the differential equation 


3 
iG seegs Ma 
y t 1 3 
fe 2 t 
yielding after integration Pearson’s Type III, 
ftp dca amr 
Pel Che caicis (8) 


To determine the constant of integration impose the condition that 
the total frequency must equal N and it follows that 


N= wf G eo) ae 


Placing ‘ 
aor 


the above reduces to 


= Yo n—} Tn) 
or 
NES 3 (9) 
Yo = eT 
where 
em Jia 
032 
But since 


1 1 
Tn) =V2Qr nthe "tizn son , 
(9) reduces to 


a + oe — (9, a) 


es al 4608 


In the above we have taken the standard deviation as the unit of the 
independent variable. If the distribution be plotted with the class 
interval as the unit, the value of (9, a) must be divided by o and we have 


N 3 3 ] 
Cd oui 9, b 
YO OVE [1 48 * 4608 (9, 0) 
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which, for purposes of computation, is superior to (9) inasmuch as it does 

not require the use of a Gamma Function, and is so very rapidly con- 
3 

verging that ordinarily one need go no further than the term ree For 


@3; = 0 this constant and Type III curve obviously reduce to the Type 
VII functions. 

Moments for distributions which obey this law are governed by the 
recurring relation 


Ont = [1 + Fan | (10) 


which is obtained from (7) by placing 6 = 0. 
This equation yields the following values of the functional moments, 


2 
lacing “* = y: 


a4 = 3[1 a y]; 
as = 203;[5 + 3y], 
cic sy oy (11) 


a; = 34;(35 + 77 y + 30 y’], 
a = 7[15 + 170 y + 2617 + 907], 
etc. 
The probable error of «, which is equal to .67449 o,, may be obtained 
from the following relations: 
not, = 6 + 18y + 47%, 
no? = 24 + 5047 + 1002 ¥? + 378 y, 
nog, = 720 + 14400 y + 54000 y? + 60750 y* + 18630 y4, (12). 
no? = 6120 + 328800 y + 2289150 y? + 5157600 
+ 4363830 y4 + 1158300 ¥5, 
etc. 
If a distribution may be represented properly by Type III, the value 
of be in the equation 
f dy eG een SRN 
y dt bo a bit “ie bel? 
should be small, — theoretically equal to zero. Actually the value of 
bz as computed from (6) will rarely if ever exactly equal zero, and the 


question as to whether or not the computed value is significant may be 
answered by comparing its value with the probable error b:. This 


FREQUENCY CURVES 107 


‘probable error, on the assumption that the distribution does belong to 
Type III, may be computed from 


8+ 60y+ 847 +277 ; 
o, = : 

From the relations (11), (12), and (13), Tables VI and VII may be 
computed. With the aid of these tables one can readily compute for 
this curve from the value of a3, by interpolation, the expected higher 
moments and their probable errors, and also the probable error of be. 


TaBLE VI. Type III. Tasie or @, ror CERTAIN VALUES OF a; © 


&, 
a 
=6 s=7 s=8 
0 15.000 .000 105.000 
dil 15.326 10.616 110.996 
2 16.312 21.931 129.536 
3 17.986 34.673 162.307 
4 20.392 49.622 212.215 
5 23.594 67.641 283.527 
6 27.674 89.698 382.069 
Us 32.726 116.898 515.481 
8 38.872 150.509 693.529 
9 46.246 191.986 928.475 
1.0 55.000 243.000 1235.500 


Tasue VII. Type III. Tasue or .674490;Vn FoR CERTAIN 
VALUES OF @; 


.67449 oa; 
Os 
Z= Os z= 

0 18.098 52.77 869.9 
ak 18.998 59.65 1106.8 
By 21.648 78.81 bya les" 
a 25.974 108.84 2915.1 
4 31.990 150.67 4756.4 
5 39.829 206.97 7660.9 
6 49.727 281.64 12163.9 
“rf 62.001 379.56 19033.0 
8 77.033 506.72 29341.4 
9 95.259 670.23 43634.5 
1.0 117.171 878.51 66697.9 
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For the distribution of Table VIII one finds 


Ve Me a, 

2 8.6907427 8.6074094 1.000 

3 14.721596 14.721596 .58297 
4 278.29904 273.98284 3.69754 
5 1488.6512 1476.3832 6.79232 
6 19353.480 19009.385 29.8093 
uf 175579.80 172989.69 92.4627 
8 2171577.4 2126981.9 387.502 


Also 5 = .05607, 
be = .0252 + .0079. 


Since the value of bz lies between three and four times its probable 
error, it appears that the values of «3 and a, do not support the hypothe- 
sis that this distribution belongs to Type III, although the evidence 
that it does not belong to Type III is not conclusive. By straight-line 
interpolation one can obtain the approximate comparison of the actual 
and expected values of a (s = 4, 5, 6, 7, 8) for (a) #3; = .583 and 
(b) a; = .6 as follows: 


«; = 6 
OBSERVED 

oak ae (5) = re P.E. @s 

(4) —(4 (6) 
3.54 16 .08 
6.65 14 48 
27.7 2.1 2.72 
89.7 2.5 17.25 

382. 6.0 118. 


For «3 = .58297, which has a probable error of + .035, it is noted that 
the expected higher moments are for each value of s smaller than the 
corresponding observed moments. But from formulas (11) it is noted 
that y is always positive and moreover that an increase in any value 
of «; will automatically increase the values of all expected higher mo- 
ments. In other words, since «3 is subject to a probable error just as 
all other moments, one is not justified in exactly reproducing it at the 
expense of all other moments. Thus if we select a; = .6 as the a3 of 
the law of distribution, we note that a much better agreement between 
the higher moments is obtained. A slightly higher value of «3 would 
give still closer values for the higher moments, but this improvement 
would be gained at the expense of «3 = .583 + .035. 
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The addition of the bef term of the differential equation, giving rise 
to Pearson’s other generalized frequency curves, will not necessarily 
remedy the situation. Thus, from the recurring relation of equation 
os we may compute the higher moments obtaining the following re- 
sults : 


AcTUAL &, Actuat — ExpEcTeD 


—_—_——— | ees 
—-——$—| | —— | 


5 6.79 32 -80 
6 29.8 — 23 4.7 
7 92.5 — 20. 38.4 
8 387. 148. 


Taste VIII 


10¢ 


1+.3¢ |uoo (1+.34 oh 106 (1 +.3) = Grap- 


LOS te EDay UATED y 


3} —8| .11584 | — .93614 | — 9.46542 4.26650 | — 2.04137 
9} —7 | .21761 | — .66232 | — 6.69679 3.77544 .23620 2 
46} —6] .381937 | — .49571 | — 5.01218 3.28438 1.42975 27 
167 | —5] .42113 | — .87558 | — 3.79753 2.79332 2.15334 142 
372 | —4 | .52290 | — .28158 | — 2.84709 2.30226 2.61272 410 
718 | —3 | .62466 | — .20436 | — 2.06631 1.81120 2.90244 799 
1186 | —2 | .72642 | — .13881 | — 1.40352 1.32014 3.07417 | 1186 
1462 | —1 | .82819 | — .08187 | — .82780 -82908 3.15873 | 1441 
1498 0} .92995 | — .03154 | — .31890 33802 3.17667 | 1502 
1460 1 | 1.03171 | — .01355 -13701 | — .15304 3.14152 | 1385 
1142 2 | 1.13348 05442 .55025 ) — .64410 3.06370 | 1158 
913 3 | 1.23524 09174 -92759 | — 1.18516 2.94998 891 
642 4 | 1.33701 12613 1.27531 | — 1.62622 2.80664 641 
435 5 | 1.43877 -15800 1.59756 | — 2.11728 2.63783 434 
235 6 | 1.54053 .18766 1.89745 | — 2.60834 2.44666 280 
167 7 | 1.64230 21545 2.17844 | — 3.09940 2.23659 173 
133 8 | 1.74406 -24157 2.44254 | — 3.59046 2.00963 102 
47 9 | 1.84582 .26618 2.69138 | — 4.08152 1.76741 59 
29 10 | 1.94759 .28950 2.92717 | — 4.57258 1.51214 33 
13 11 | 2.04935 31163 3.15093 | — 5.06364 1.24484 18 
9 12 | 2.15112 33266 3.36356 | — 5.55470 -96641 9 
5 13 | 2.25288 35274 3.56659 | — 6.04576 67838 5 
8 14 | 2.35464 .387192 3.76052 | — 6.53682 -88125 2 
2 15 | 2.45641 -39030 3.94637 | — 7.02788 -07604 1 
16 | 2.55817 40793 4,12463 | — 7.51894 — .23776 1 


10701 
Sy = 10701 v', = .688347 
2ry = 7366 v's = 9.164564 
Ezy = 98070 v2 = 86907427 -y = yo(1 +31) ¥o 
= 2.9480065 


log yo = 3.15755 
a oeOOO@@®@®=~$~$~~=O~— 


110 HANDBOOK OF MATHEMATICAL STATISTICS 


The differences between the actual and expected moments is con- 
siderably greater for s = 7 and 8, although the functional higher mo- 
ments tend to minimize the seriousness of the variations. 

The questions 

(a) precisely how many terms in the denominator of the differential 
equation should be retained? 

(b) are we justified in absolutely reproducing certain moments at 
the expense of others? 

(c) what is the best criterion for goodness of fit? 
may be regarded at the present time as furnishing material for research. 

Table VIII shows the graduation of the distribution on the hypothesis 
that the distribution belongs to Type III, treating «; = .6 and further 
treating it as one of a discrete variable. The moments are therefore 
unmodified. 

If graduated as a distribution of a continuous variable, the moments 
must be modified and the graduated frequencies — being areas — ap- 
proximated by a quadrature formula. 

If it be impossible to graduate a particular distribution by either the 
normal curve or Type III, the equation 

UC Bey Stak Meee 
y dt bo + bit + bol? 
must be used. 

The integration of this equation depends upon the nature of the rocts 
of the quadratic bo + bit + bef?. Thus if the roots be real, the integra- 
tion of the equation yields 


an t\™ t \m: 
eat Clad Ml Chums ie (14) 
and if the roots be complex 
2\m™ m,tan-1t 
y= y(t nye stan (15) 


For an extended discussion of these curves, including special and 
transitional cases, the following sources may be consulted. 


(a) ‘“ Mathematical Contributions to the Theory of Evolution, II,” by Karl Pear- 
son, Phil. Trans., A (1895), vol. 185, pp. 343 et seq. 

(b) Frequency Curves and Correlation, by W. P. Elderton. C. and E. Layton, Lon- 
don, 1906. 

(c) A First Course in Statistics, by D. Caradog Jones. G. Bell and Sons, Ltd., 
London, 1921. 

(das Mathematical Contributions to the Theory of Evolution.” Second supple 
ment to a memoir on Skew Variations, by Karl Pearson, Phil. Trans., A (1916), vol. 
216, pp. 429-57. 
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DIFFERENCE EQUATION GRADUATION 


The reasoning which prompted Pearson to choose his differential 
equation also suggests 
Ay: _ y(a — 2) (16) 

Az bo + bia + bow? +--+ 
as the difference equation of a unimodal distribution. 

By arbitrarily allowing Az to represent the difference in magnitudes 
of two successive class marks, this element may be considered equal to 
unity. 

It may be noted that a determination of the values of the constants 
of the differential equation (2) does not permit a graduation of a distri- 
bution until the equation has been integrated, yielding a solution of the 
form y = yo: f(x). In using the difference equation the situation is 
otherwise, for as soon as the values of 


Wot 1 4 AYe 
Yz Yz 
are computed, these ratios permit a calculation of a series of ordinates 
which are proportional to those required of the graduation. The con- 
dition that the sum of the graduated ordinates must equal those of the 
ungraduated will always determine the proper factor of proportionality. 
Therefore, since the difference equation requires no integration in 
the course of the graduation the idea of type — an outgrowth of in- 
tegration — is of secondary importance. 
Writing (16) as 


(by + biz + bor? + ++ -)Ay, = (a — Z)y2, (17) 
multiplying through by z* and summing with respect to z yields 
bolxAys + bi Dat Ay, + beDartAy, + +++ = alay, — Zartty,. (18) 

If the range of the distribution be from 7 = — © tox = o and we 
assume that the extreme ordinates yz and also z"y, vanish, then 
22° AY > ted Nees AGS eis. es) + agate s are gC gat ys + es 
Giving n in (18) successively the values 0, 1, 2, +++, we obtain cor- 
responding to (5), selecting the mean as origin, 
a + by ae be+ a 0, 
Bo 77% bi se (3 vo+1)bo-+ Pe Ry 
¥2 = bo+ (3 ve+'l)bi+ (43 —6ve—1)bo+ +++ =v, 
ya+(3 vo+1)bo+ (4 vg— 6 2 — 1)bi+(5 v4,—- 10%3+10 v2—l)bo+ OEY 
where yy = Dee, (19) 


Z2Yz 
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If we now set 


and’ 
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Dee te Ge 
5 oc, v2 


? 


peey so 


v2 


(20) 


a solution of (19) yields — keeping no terms in bo + dix + box? + +++ 
of greater than the second degree — 


ms ee ee 1 b, = be —a 
al Pys(le 6) ee bid pn haw 
Ge late ROE OY, ge oe a 
z GE EOE 2(1 + 2 8’) 
TaBLp IX 
: AsscissA MiasurED Provo 
fe sk | -heet te Dal a 
QUENCIES . 2, Ps 
armeiet Yl! Mean=-2 
3 — 8 — 8.6883 243.87 15.812 10 
9 -—7 — 7.6883 235.73 38.269 6.1597 154 
46 -6 — 6.6883 | 229.58 62.727 3.6600 950 
167 —5 — 5.6883 225.44 89.185 2.5207 3477 
372 —4 — 4.6883 | 223.29 117.64 1.8980 8789 
718 -—3 — 3.6883 223.14 148.10 1.5067 16683 
1186 —2 — 2.6883 | 225.00 180.56 1.2461 25136 
1462 -—1 — 1.6883 | 228.85 215.02 1.0643 31322 
1498 0 — .6883 | 234.71 251.47 -9333 33338 
1460 1 .3117 | 242.56 289.93 .8366 31115 
1142 2 1.38117 | 252:41 330.39 -7640 26031 
913 3 2.3117 | 264.27 372.85 .7088 19887 
642 4 3.3117 | 278.12 417.31 -6665 14096 
435 5 4.3117 | 293.98 463.76 .6339 9394 
235 6 6.3117 | 311.83 512.22 .6088 5955 
167 7 6.3117 | 311.68 562.68 -5895 3625 
133 8 Goll (| 9353.04 615.14 5747 2137 
47 9 8.3117 | 377.39 669.60 .5636 1228 
29 10 9.3117 | 403.25 726.05 .5554 692 
13 11 10.3117 | 481.10 784.51 5495 384 
9 12 11.3117 460.95 844.97 5455 211 
59 13 12.3117 | 492.81 907.43 5431 115 
8 14 13.3117 | 526.66 971.89 5419 63 
2 15 1431174) 562.52 1038.34 .5417 34 
16 15.3117 | 600.37 1106.80 5424 18 
17 16.3117 | 640.23 1177.26 .5438 10 
18 17.3117 | 682.08 1249.72 .5458 5 
10701 234859 
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It follows that we may therefore write 


You _ 2 + or + C2 
Yo we +owtec,’ a) 
where 
v3\ lL 


2 
Cc -—-l-— 1-— = = is 
1 ( 5”? C3 iets Ie 53 


a = vi( 1 +2), C4 = Co + 63 — 1. 


Table IX illustrates difference equation graduation. The results 
are practically identical with those obtained by using Pearson’s gen- 
eralized curve IV, which is the curve which would be used according to 
Pearson’s criteria. 

Graduation of the stump of a distribution. From equation (18) we 
obtain the equations 

bod Ays + bi 2a Ayz + beta?Ay, = al ys — Ux yz, 
(24) boda Ayz + bi Za*Ayz + beta®Ay, = alz yz, — Lz*y2, 
boXa*Ay, + biraAy, + beratAy, = alz’y, — Uay., 
bozx*Ay, + biratAy, + berx*Ay, = ala*y, — Iaty,. 


To graduate the stump of a distribution, or even an entire distribu- 
tion, these equations may be used directly without restricting the 
origin to the mean. 

If it be desired to graduate that portion of a distribution lying between 
x =r and x = s, the summations must be from z = r to x = s — 1, 
since Ay,_; involves y,. Clearly we could not sum to x = s, since such 
procedure would require a knowledge of the frequency at z = s + 1, 
which is contrary to hypothesis. 

It is not necessary to compute by the long method two sets of moments ; 
that is, 2x*y, and Zxr"Ay, since 
zZ Ay: = Ys — Yr 
Ze Ay. =(s—-—1l)y—@—lI)yr— Zys 
Za*Ay, = (s — 1)’y.— (r—1)’y,-—2rryz+ Zyr 
Dx*Ay, = (s — 1)?y, — (Yr — Ly, — 3 2a’y. + SZryz — yz 
Dray, = (s — 1)*y. — (r — 1)4y, — 4 Says + 6 Dx*y, — 4 Zrysz+ yz 
LaAy, = (s — 1)5y, — (r — 1)'y, — 5 Zxty. +10 Day, —10 Za’y, 

+ 5 Zryz— Zyz 

Further examples of graduations of frequency distributions and 
stumps of frequency distributions by the use of difference equations may 
be found in a paper by the writer.' 

1 H.C. Carver, “On the Graduation of Frequency Distributions,” Proceedings of the 
Casualty Actuarial and Statistical Society of America, vol. 6, Part I, No. 18, pp. 52-72. 


(23) 
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THE GENERALIZED NORMAL CURVE—CHARLIER THEORY 


The fact that many distributions do not satisfy ‘“‘normal”’ criteria has 
led to many generalizations of the normal curve.'! Of such generaliza- 
tions, that of Charlier is particularly noteworthy. In Ueber das Fehler- 
gesetz? and Die Zweite Form des Fehlergesetzes ? Charlier has shown that 
on the hypothesis of elementary errors any law of error may assume one 
of the following two forms: 


Type A. 
F(x) = Aodi, + Aso + Aso + °°, (1) 
1 _ (x —b)? 
where din = € 20. 
a oV2T 
Type B 
F(z) = Bola) + Bide, + Boa +>, (2) 
SNe 2 3 
Nose ie aia Ed BP hig atalino sn cad aes ad TE Eo |) 
hea ke ug Zz @=1l | @=2)2 @— sp. 
If x be a positive integer or zero, the above series in x reduces to 
A Xz 
Yo = a [x . 


In Ueber die Darstellung willkirlicher Functionen® the values of the 
coefficients A,, B,, which are independent of x are obtained by impos- 
ing the usual conditions associated with the method of moments. 

The function 


N 3 [x ‘1 23 
CS Sa 1 ede od (ee Co Ge o?, 
NEE | 2 Az 3 a) |e a 


which has been stressed by Bowley and Edgeworth is equivalent to the 
first two terms of the type A function as noted above. 
For purposes of computation the type A curve may be written in the 


form ‘ 
FQ = z(t — | 08 + FOB — BO + “h (3) 
where C3 = Os 
Ce = hg 8 
Ch = as — 10 as 
Cg = hg — 15 a, + 30 
C7 = &y — 21 as + 105 a 
Cg = & — 28 ag a 210 ay, = 315 
1See Chapter III, Part II, Bowley (P. S. King & Son, London, 1920), and Law 
of Error, by Edgeworth, Camb. Phil. Trans. (1904), vol. 20. 
_ 2 Arkiv for Matematik, Astronomi och Fysik, Band 2.N: O08. 3 Loc. Cit. 
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and in general 


n® n® n® (8) 
ae ere = Gp_2 + —— 2/2 Qy-4 — 2/3 On—g + ae Ong, ete. 

If series (3) is so rapidly convergent that terms after the third or 
fourth may be neglected, it affords us a simple representation of any 
distribution. 

The rapidity with which this series converges, however, depends upon 
the extent to which the generating function, ¢$(¢), is a fair approximation 
of the unknown law of distribution. In general, if the distribution be 
approximately symmetrical, equation (3) is very rapidly convergent. 
If, however, the distribution be bimodal or extremely asymmetrical, 
series (3) as a function of the symmetrical generating function, ¢(t), 
is for practical purposes of no value because 


(a) the number of required terms in (6) would necessitate an im- 
practicable amount of labor; and 


(b) the probable errors of @, for values of n greater than 5 or 6 are 
generally so large that the assumption that we may substitute 
the moments computed from the observed data for the theo- 
retical moments is open to serious criticism. 


It should be understood, however, that theoretically equation (3) is 
capable of representing any frequency function associated with grad- 
uated variates, but that convergence means one thing and rapid 
convergence quite another. 

Table IX a shows the results of graduating a distribution by using 
¢(t) as a generating function. Table [X b shows the successive steps 
in the computation of the graduated values of the function by the use 


of three terms of the series (3). Values of the integral uf ‘p(t)dt, and of 


¢(t) and its derivatives are given in tables on pp. 209-216. 

In case series (3) is not sufficiently rapidly convergent, we may pro- 
ceed as follows: 

Let 6, be any arbitrarily chosen function which may be used as a 
statistical generating function and which, being a frequency function, 
may be expressed by means of equation (3) in the series 

= U d; (3) t (4) t ds (5) t és | 4 
6.=N {#00 - at O+4¢ O- FPO | A 


where d, is a function of the moments of 6. 
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Eliminating $(t) between (3) and (4) we obtain 
+ , , 
PO = Og Oe ara ee se (5) 


N’ [3 
where 
A’s = ¢3 — ds 
A's = C4 — ds, 
A’; = C5 — ds 
A’s = (C6 a de) — 20 d3(cs eo} dz) 


etc. 


To illustrate, we could select as a generating function 


6; = Yoe 7% (6) 


For p = 2 this becomes the normal curve of error, but 0, as defined 
above would be more general than ¢;. Thus, one can readily select a 
value of p which will make A’, vanish for any distribution. 
To illustrate again, 
6, = Yo sech” ax (7) 


may be selected. This generating function has the same degree of 
freedom as (6) but possesses the added advantage of being integrable 
for integral values of n. Thus (7) does not necessitate the use of 
quadrature when dealing with a distribution of a continuous variable 
which is treated as such. (See Case V, p. 95.) _ 

For skew distributions it is frequently desirable to choose a generat- 
ing function which is essentially skew. Generating functions of this 
type which readily lend themselves to integration are rare. 

If one chooses 


é = y(1+ Aye (8) 


(Pearson’s type III) as the generating function, the coefficient A; will 
automatically disappear if the third moments are equated. However, 
a value of a may be selected which will cause either A, or A;to vanish. 

For a further study of generating functions which tend to make fre- 
quency series more rapidly convergent one may refer to A Number of 
New Generating Functions with Application to Statistics, a dissertation 
submitted in partial fulfillment of the requirements for the degree of 
Doctor of Science in the University of Michigan, 1923, by Emeterio 
Roa. 

Table IX, referred to above, is taken from this dissertation. 

Corresponding to (5) it may be shown that any frequency distribu- 
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Taste IX a. Grapvation or Distrisution or Hetcuts or 6441 
CoLoRED SOLDIERS 


Crass 
INTERVAL 


Q@) 


146-147 
148-149 
150-151 
152-153 
154-155 


156-157 
158-159 
160-161 
162-163 
164-165 


166-167 
168-169 
170-171 
172-173 
174-175 


176-177 
178-179 
180-181 
182-183 
184-185 


186-187 
188-189 
190-191 
192-193 
194-195 


196-197 
198-199 


OBSERVED 
FREQUENCIES 


One term 
of (3) used 


a 


GrRaDUATED RESULTS 


Two terms 
of (3) used 


Three terms 
of (3) used 


(3) 


(5) 
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tion can be represented as a series proceeding by differences instead of 
derivatives, or 


N a B; a B, B; 5 ay 
Foe |” jp OG + Tg A, — Ta + . (9) 

Charlier’s type B curves are special cases of these series. Thus 
if the Poisson exponential binomial limit be selected, 


eA rz 


|x 
we have Charlier’s type B. 
But since, for this generating function, 


6,.= y= 


vo = ¥3 


and the third moment in practice is rarely as large as the second moment, 
a better selection of generating function with greater freedom, can 
usually be found. In general, the point binomial 


A ts p**q? 


is better, since Poisson’s binomial limit is merely a special case. The 
extra degree of freedom obtained by not restricting the generating func- 
tion to a limiting value results in a better first approximation to the 
observed distribution and hence to a more rapidly converging series. 

It must be borne in mind that any series involving derivatives or 
finite difference series can theoretically be used on a distribution of a 
continuous variable only by resorting to mechanical quadrature. (Case 
I, p. 92.) This applies to Charlier’s type B series as well as to Pear- 
son’s generalized curves. 

Practically, such series are, however, entirely satisfactory since the 
distribution may be treated as one of a discrete variable by leaving all 
moments unmodified and permitting the ordinates, instead of areas, 
to represent the graduated frequencies. (Case V, p. 95.) 


CHAPTER VIII 


SIMPLE CORRELATION 
By H. L. RIETZ anp A. R. CRATHORNE 


MEANING OF CORRELATION! 


Let us assume data consisting of pairs of corresponding values 
(x1, Y1), (22, Yo) °° * (ny Yn) Where we are interested in a quantitative 
description of the association of the x’s and the corresponding y’s. These 
values may arise from any one of a great variety of situations. For 
example, the z’s may be the maximal daily temperatures of Boston and 
the y’s the corresponding values for New York City; the z’s may be 
statures of fathers and the y’s those of their oldest sons; the z’s may 
be the numbers of working hours 
per day of a group of laborers and 
the y’s the corresponding wages 
of the group per day. 

Scatter-diagrams. If such a set 
of pairs of numbers is represented 
by a system of dots, marking the 
rectangular coérdinates of points, 
we obtain a so-called ‘scatter- 
diagram.” If we take the origin 
of codrdinates so as to measure 
x’s and y’s from their respective 
mean values, the scatter-diagram of maximal daily temperatures of 
Boston and New York for July, 1920, is shown in Figure 7, where 
Boston temperatures are abscissas and New York temperatures are 
ordinates. It is clear from the plotting of such data that, with an as- 
signed value of x, the corresponding value of y may have many values 
and thus cannot be accurately predicted by the use of a single-valued 
function of x, On the other hand, it is fairly obvious that for an assigned 


Fie. 7 


For definitions of correlation, see Karl Pearson, Drapers’ Company Research Mem- 
oirs, Biometric Series II (1905), p.9; H. L. Rietz, Annals of Math., vol. 13° (1912), 
pp. 187-92; E. V. Huntington, Amer. Math. Monthly, vol. 26 (1919), pp. 423-27; 
A. R. Crathorne, Report of National Committee on Mathematical Requirements (1923), 
Chap. X, pp. 105-28. 
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value of x larger than the mean value of 2’s, a corresponding y taken 
at random is much more likely to be above than below the mean value 
of y. In other words, the z’s and y’s are not independent in the proba- 
bility sense of independence. There is a tendency for the dots in Fig- 
ure 7 to fall into a sort of band which can be fairly well described. 

Briefly stated, there is an important field of association between per- 
fect dependence given by a single-valued mathematical function at the 
one extreme and perfect independence in the probability sense at the 
other extreme. The theory of correlation is devoted to the description 
and characterization of this type of association. 


THE CORRELATION COEFFICIENT 


The correlation coefficient r. The most important measure of the 
degree of correlation is the so-called Pearsonian coefficient of correla- 
tion universally. represented by the letter r. It is often called the 
product-moment coefficient. 

Given the pairs of corresponding values 


(X1, Y x); (Xa, Y2), ge Re nA. ru 


of the variables X and Y measured from an arbitrary origin; let us 
take X, Y for the arithmetic means of the given values of X’s and Y’s 
Peenecively: Then Pe 

ri = XE —— X: ; 

ve Y; << Y 
are deviations from mean values, and the standard deviations ! of the 
two series of values are 


e=Vl2G2), oy = V2), 


where the summation indicated by = extends from 7 = 1 to? = n. 

Let the same values which are denoted by z in original units of the 
data (yards, pounds, kilograms, dollars) be denoted by z’ when they are 
measured in the standard deviation ¢, as a unit. Similarly, let the 
values of y be denoted by y’ when measured in o, asa unit. That is, 


xy = Es, Yi — Ys 

Oz Cy 
Then, in terms of 2; and yj, the correlation coefficient r is given by 
the simple formula 


r= yyy! (1) 
n 


1 For the meaning of standard deviation see Chapter 2, p. 27. 
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That is, the correlation coefficient of two sets of values, expressed in 
their respective standard deviations as units, may be defined as the 
arithmetic mean of the products of deviations of corresponding values 
from their respective means. 

Although the expression (1) for r is very simple for the purpose of 
giving a first notion of the meaning of r, the following formulas easily 
obtained from (1) are usually better adapted to numerical computation : 


Lee cy; 
n = 22; YU (2) 
Ty VE (ai) VE(yi) 


Iy(x, a X)(Y; ome Y) 
13(X:¥) — XY 


Ty 


23(Xi Vy. 


Viz - B flap - 
n n ‘ 


Numerical computation of the correlation coefficient. For relatively 
small values of n, say n < 30, formula (5) is usually used where X and Y 
are the values given in the original ! measurements. 

If n is a large number, it usually saves labor to construct a so-called 
correlation table from the data as explained below and to organize the 
computations around the table. 

In this connection, it is especially important to remember that the 
formulas (3), (4), and (5) hold when X and Y are measured from any 
arbitrary origin in any unit. 

For dealing with large numbers, formula (4), with the values of X 
.ueasured from a class mark near the mean of the X-series, and those 
of Y from a class mark near the mean of the Y-series, is usually most 
suitable. 

Correlation table. A correlation table is simply a double-entry table 
constructed from the given data. In Table I is shown such a table 
from a ten years’ record of the July maximal daily temperatures of Bos- 
ton and New York. Such a table contains a system of columns and a 
system of rows each of which is a frequency distribution. The numbers 


1J. A. Harris, Amer. Naturalist, vol. 44 (1910), pp. 693-99. 
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in a column corresponding to an assigned X = X; form what is often 
called an X array of Y’s, and those in a row corresponding to Y = Y, 
a Y array of X’s. In this special case, X represents Boston tempera- 
tures and Y, New York temperatures. Any number in a compartment 
of this table, say 10 in the column headed 85 and the row marked 79, 
is called a cell frequency and indicates that on 10 days of the total 310 
days the maximal temperatures of Boston were between 83.5 and 86.5 
and those of New York on the same days were between 77.5 and 80.5. 
A cell frequency of the s-th column and the ¢-th row will be denoted by 
Tete 
Form for the calculation of r. In calculating r it is important to have 
a systematic form in which to arrange the work to avoid confusion in 
the somewhat complicated details. Given the correlation table, we first 
add the frequencies in the rows and columns. This gives two total 
frequency distributions — the one with respect to temperatures of 
Boston in the row marked f, and the other with respect to the tempera- 
tures of New York in the column marked f,. Arbitrary origins are 
taken near the means and the class intervals chosen as units of measure- 
ment. This gives the column headed Y and the row marked X. The 
next two columns and the next two rows are self-explanatory and are 
used in calculating the means X and Y, and the standard deviations 
cz, 7, by the method explained on page 28. We find 


X = 0.5871 in class intervals as the unit, and measured from 79°. 
gz = 2.8119 in class intervals as the unit. 

Y = — 0.3323 in class intervals as the unit, and measured from 82°. 
o, = 2.1212 in class intervals as the unit. 


As for the next column, the heading T is an abbreviation for Zn X : 
which means the sum of the products of each cell frequency of the array 
t (row) and the corresponding value of X. Thus to find the fourth 
number, 76, we have 


(—-2)X 2=— 4 
(-1)X l=- 1 
0X 3= 0 
1X 3= 3 
2X 3= 6 
3X 6= 18 
4x ili 44 
SX 251510 
76 = T for the fourth row. 
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In a similar way the S marking the next to last row is an abbreviation 
of Zn, which means the sum of the products of the cell frequencies 
of the array s (column) and the corresponding value of Y. Thus, to 
find the third number in the rows marked S, we have 


(+1)X1l=+ 1 


Oxl= 0 
-1xX0= 0 
—-2xX2=- 4 
—-3xX5=-—15 
-4X1l=- 4 
eee Lembo, 


— 27 = S for the third column. 


The column headed 7'- Y is formed in an obvious manner by multi- 
plying together the corresponding numbers in the 7’ column and the Y 
column. The lower row marked S - X is formed in a similar manner from 
the products of corresponding numbers in the S and the X rows. It 
is evident that the total of the last column or of the last row is 2(XY) 
(= 1220) for all the 310 entries in the table and hence we have a good 
check on the accuracy of the computation. 

We have now computed X, Y, =(XY), oz and o, and substitution 
in formula (4) gives for the value of the coefficient of correlation 

r = 0.6925. 


Probable error of r. If n is a fairly large! number and if the re- 
gression is not far from linear (see p. 126), the probable error of r is given 


by the formula ‘errs 
PE = 0.6745 . 
Vn 
When applied to the above calculation of r, we have r = 0.6925 + 0.0199, 
or r = 0.693 + 0.020 if we use three places of decimals. 
Other formulas for r. It is easily verified by comparison with (8) 
that ? 1 ; ae 
r=1——— (2; — y's) (6) 
2n 
m= — 1 par’; + 9’)? (7) 
2n 
= 1 Mean =— 1 4 Sean, (8) 


1¥For errors of sampling with small numbers, see Student, Biometrika, vol. 6 
(1908-9), pp. 302-10. Also Soper, Young, Cave, Lee, and Pearson, Biometrika, 
vol. 11 (1915-17), pp. 328-413. 

2 See Huntington, Amer. Math. Monthly, vol. 26 (1919), p. 424. 
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where oy), and O44) are the standard deviations of (x’ — y’) and 
(a’ + y’) respectively. 

From these formulas, we have at once the important property that 
r is not less than — 1 nor greater than + 1. 

It is easily verified by comparison with formula (2) that 


2 2 g? 
ey TEN (9) 
20,0, 


When n is a small number, the values of r may in certain cases be ob- : 
tained very simply from (9). 


REGRESSION 


Lines of regression. If we mark the mean temperature for each 
column of the “ scatter-diagram ” (Fig. 8) by a cross, the “‘line of re- 
gression of Y on X ”’ is the straight line 


Y=mX +}, (10) 


which fits ‘‘ best ’’! this system of crosses. 

The line of regression of Y on X has the property that the sum of 
the squares of the distances (measured parallel to the Y-axis) of all the 
dots is less than from any other straight line. Moreover, the values 
of Y computed from the regression line are more highly correlated with 
the corresponding observed Y’s than when calculated from any other 
linear function of X. 

For this reason, the equation of the line of regression of Y on X may 
be regarded as that linear relation which on the whole gives the “ best ”’ 
estimate of Y corresponding to an assigned X in so far as such a pre- 
diction can be made by means of a linear function. 

It is easily shown that (10) takes the form 


Ve ia (nex) (11) 
or y = 142, where y= Y—Y andz =X — X, 
or y’ = rz’, where c’ =~ andy’ = 4. 
Cz Oy 


1 The term “‘best”’ is here used to mean the best under a least-squares criterion of 
approximation. In applying the criterion, the squares of distances of the crosses 
from the line are weighted with the number of dots in the corresponding column. 
See G. Udny Yule, Proc. Roy. Soc., vol. 60 (1897), pp. 477-89; Introduction to the 
Theory of Statistics (1915), pp. 168-75. 
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Thus, given X the maximal daily temperature of Boston, we have 
for the best estimate of the maximal daily temperature Y of New York 


Y=Y+r%(X — X) = 81.00 + .5224(X — 80.76) (12) 
Oz 
= .5224X + 38.81, 


in so far as such a prediction can be made by means of a linear equation. 
This tells us that corresponding to an assigned change of 1 degree in 
X, there is on the average a change of .5224 degrees in Y. 


D 
91 97,7 | 100 | 103 
ao ae 


Means of columns marked a . 
rows 

AB= bythe sion line of yon z 

CD = oa “« gony 


Fia. 8. Scatrer-Dracram ror Dara or Tasie I, SHowina CENTER OF 
TasLes, Mmans or ARRAYS, AND LINES OF REGRESSION 


In a similar manner, we find the line of regression of X on Y to be 
X=X+7r7(Y — Y) = 9180 Y + 6.40. (13) 
Gy 


It is important to note that the value of X in (13) cannot be obtained by 
solving for X in (12). 

When there is no correlation between X’s and Y’s, r = 0; and ex- 
cept for chance fluctuations, (12) and (13) are parallel to the X and Y 
axes respectively. But conversely, when r = 0 it is not necessarily 
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true that there is no correlation. There may be high correlation ' with 
non-linear regression when r = 0. 

The mean square error of estimate — standard deviation of arrays. 
In estimating Y as just shown from the regression equation of Y on X, 
it is important to know something about the variability in the Y arrays. 
The mean square error in estimating Y’s, by taking the means of arrays 
of Y’s may be defined as the mean of the squares of the standard devia- 
tion of these arrays, the square of each standard deviation of an array 
being weighted with the number in the array. If the means of these 
arrays fall exactly on the line of regression of Y’s on X’s, then it can be 
proved that 

sy = o7°(1 — 7°), (14) 
where s,? is the mean square error in the estimate of Y. The value s,? 
may also be defined as the mean of the squares of the deviations of 
the dots of the scatter-diagram from the line of regression of Y on X, 
when distances are measured parallel to the Y-axis. 


From (14) we have 
Sy = %V1 — 7, (15) 


This value of s, may be regarded as a sort of average value of the stand- 
ard deviations of the arrays of Y’s and is sometimes called the root 
mean square error of estimate of Y, or more briefly, the standard error 
of estimate of Y. The factor V1 — r? in (15) has been called the co- 
efficient of alienation or the measure of the failure to improve the esti- 
mate of Y from knowledge of the correlation. 

To illustrate, suppose that r = .6925 be the correlation coefficient of 
maximal daily temperatures of Boston and New York. Assuming lin- 
ear regression the root mean square error of estimate of the tempera- 
ture of New York from that of Boston would be 


8y = oyV1 — 7? = 0.7214 oy. (16) 


Hence the average variability in the arrays of New York temperatures 
which correspond to assigned Boston temperatures is more than .7 as | 
great as the average variability of all the New York temperatures. We 
find therefore that we cannot, with any considerable degree of reliability, 
predict on a given day the maximal temperature of New York from that 
of Boston. However, with large numbers, we can give a very reliable 
prediction of the mean maximal daily New York temperature that cor- 
responds to an assigned Boston temperature. 


1H. L. Rietz, “On Functional Relations for which the Coefficient of Correlation 
is Zero,” Quar. Pub. Amer. Stat. Assoc., vol. 16 (1919), pp. 472-76. 
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An analogous discussion of the estimation of X from the means of 
the arrays of X’s leads to the equation 


82 = 0,V1 — 7. 


THE CORRELATION RATIO 


Non-linear regression.! The correlation ratio. When the means of 
a system of parallel arrays do not lie near a straight line, the regression 
is said to be non-linear. We then speak of the curve formed by the 
means of arrays of Y’s as the regression curve of Y on X. In the case 
of linear regression we found, (14), 

s = o,/7(1 — 1), 
2A) sy? 
or r=l]— ray (17) 

* 7 2 
If we think of s, as measured in terms of ¢, as a unit, the discrepancy of 
r? from unity, or the departure from perfect correlation is measured by 
s,2.. When s, approaches o,, then 1 — * approaches zero. If s,? is 


y 
small, that is, if the points of the scatter-diagram tend to concentrate in 
2 


8 
a narrow band the expression 1 — 55 approaches unity. This suggests 


y 
2 


8 4 : : 
the use of 1 — ag as a measure of correlation for linear or for non-linear 


y 
regression. We then write 
Myst =1-4 (18) 
Cd. 


where 7,2 is called the correlation ratio? of yonz. For linear regres- 
sion, we have » = rf. 
An analogous discussion for the arrays of X’s leads to the equation 


nat = 1-2 (19) 


giving 7,, the correlation ratio of x on y. 

In general we may say that the correlation ratio is a measure of the 
concentration of the dots of the scatter-diagram about the regression 
curve. 

1 Karl Pearson, “On the General Theory of Skew-Correlation and Non-linear 
Regression,”’ Drapers’ Company Research Memoirs, Biometric Series IT (1905). 

2 Karl Pearson, loc. cit., p. 10. 
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By a slight transformation, (18) and (19) can be put into the forms 
Om? Om,? 
Me = TP? Ney = Bele (20) 
where om, Om, are the standard deviations! of the means of the X 
arrays of Y’s and of the Y arrays of X’s respectively. The expres- 
sion “correlation ratio’? probably had its origin in the ratios given in 
(20). 
Form for the calculation of correlation ratios. Referring to the nota- 
tion used in the computation of the correlation coefficient it can easily 
be shown 2 that the two correlation ratios can be expressed in the form 


f= EoG)- Ph ce (21) 
w-[s@-*] (2) 


To compute the two ratios in connection with the correlation coefficient, 
it is then necessary to add two columns and two rows to the form, p. 123, 


for the computation of r. For example, in the temperature problem 
the two columns and two rows are given below: 


T? T? + fy S? S? +f; 
CRETE od ete he ay ot LOGO LOOT. rete her a) oS LoS 
4480 Peet cd tet wel ine M ane OLO-OL Can ee vere) See 
1681 . 152.82 729. . SP Ao weeia!.  Whieur 
5776 6 186.32 F2401 5.0.) wt te 96.04 
5625 “ 125.00 729. « ‘ 30.38 
(Pa, Wace ea RE", 1936.4 (ares es a) ts ie er OU OU 
AOE Mudie a Nie ans Team ae O74" 1036.20. es cee Oars 
LOO Gein ous. Meme saved 200.00 256 car ctats arate «eae 6.10 
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1 S?\ _ 746.60 
n Gas S3108 
Having summed up the column marked TJ? + f, and the row marked 


1In finding these standard deviations, the squares of the deviations of the means 
of arrays from the mean of all values are weighted with the numbers in the arrays. 

2A. R. Crathorne, “Calculation of the Correlation-Ratio,’” Quar. Pub. Amer. 
Stat. Assoc., vol. 18 (1922), pp. 394-96. 
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S? + f,, the values of the two correlation ratios are then easily com- 
puted from (21) and ea giving 


= 0.7146, ny = 0.7010. 


Probable error ie 7. oe probable error of a correlation ratio is 
given approximately by the expression 


PE = ae 4) (23) 


When applied to the above calculations of n,2 and yy, we have 
Nye = 0.7146 + 0.0187, nz, = 0.7010 + 0.0195. 


Test of linearity of regression. Since r is a good measure of correla- 
tion only if we have nearly linear regression, it is necessary to examine our 
data for linearity of regression. A necessary and sufficient condition for 
linearity is that y? — r* shall differ from zero by an amount not greater 
than the fluctuations due to random sampling. A common test! is 
the comparison of this difference with its probable error, 

PE= 0.6745 FV GP —Pr)fa—7)?-(1—r)?+1f. (24) 
If 7? — r? is small compared with r, or if y and r are both small, an easily 
calculated test arising out of (24) is 


we 
Teh) Cane oe ed 
or nt — £7) < 11.37. (26) 


For our temperature problem we find 
N(nyx? — 7T?)= 9.64, n(y2? — 7°) = 3.66, 


and the test for linearity is satisfied for both regression lines. 
Corrections for grouping and errors of observation. In using certain 
class intervals for grouping data in finding the correlation coefficient or 
the correlation ratio, it is important to correct the computed values 
for grouping. Sometimes it is also desirable to correct for errors 
of observation. Certain appropriate corrections have been published. 


1 J. Blakeman, “On Tests for Linearity of Regression,’’ Biometrika, vol. 4 (1906), 
pp. 332-50. 

2G. Udny Yule, Introduction to the Theory of Statistics (Edition 1922), pp. 211-14. 

T. L. Kelley, Statistical Method (1923), p. 168. 

Karl Pearson, ‘On the Correction to be Made in the Correlation Ratio,’’ Bio- 
metrika, vol. 8 (1911-12), pp. 254-56. 

Student, ‘The Correction to be Made in the Correlation Ratio for Grouping,” 
Biometrika, vol. 9 (1913), pp. 316-20. 
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FURTHER METHODS OF DETERMINING CORRELATION 


Correlation from ranks. When data are not measurements expressed 
in cardinal numbers, but are merely marks of the orders or ranks of 
individuals in a series, we may seek a measure of correlation between 
corresponding ranks. For example, the following table gives the ranks 
of ten students in two tests: 


STUDENT 


Rank in first test 
Rank in second test 


Changes in rank 


Let v, and vy be the ranks of corresponding variables in two series of 
n individuals each, then the correlation of vz and v, is given by ! 


63 (vz — vy)? 
a = 7H co 
Applying this to the student tests above, we have 
p = 0.891 + 0.047, 
where the probable error is obtained from the expression 


PE = ey eee {1 + .086 p? + .013 p' + .002 p°}. (28) 
n 


pl 


Under the assumption of a normal frequency distribution, the corre- 
sponding value of the correlation coefficient of the variables that corre- 
spond to these ranks is given by 

T 
r=2sin(-—p 29 
with a probable error ( ) oo 


PE = 90-4068 — 1) 51 4 04272 + 00814 + 00215. (30) 
Vn 


For the student rank data we find 
r = .900 + .048. 


It is easily shown that the correlation between actually measured 
variables can be made to change very much without changing ranks. 
Thus the two series 

60, 50, 40, 30, 20 
100, 99, 98, 3, 1 

‘Karl Pearson, “On Further Methods of Determining Correlation,” Drapers’ 

Company Research Memoirs, Biometric Series IV (1907), p. 13. 
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are illustrations of perfect correspondence in rank, but the correlation 
of the actual numbers is far from perfect. The nature of the distribu- 
tion is a fundamental consideration in the case of correlation in ranks. 

Ties in rank. The question of appropriate ranking arises when two 
or more values are equal. In this case two methods are in use: 

(a) The bracket rank method. In this method the individual values 
of the ties are assigned the same rank and one greater than that of the 
individual which immediately preceded the ties. The next individual 
after the ties takes the rank it would have had if each of the preceding 
individuals in the ties had different ranks. 

(b) The mid-rank method. In this method, the individual values 
of the tie are given equal rank but that rank is the value at the middle 
of the set of ties. The following illustrates both methods: 


MEASUREMENT ; BrackET MrEetTHop Min-Rank METHOD 
COMMESTE Ss Rete ues hed ces Ls 1 
80 Gite Z 
85 sLiors 4 
85 Py she 4 
tale ry w 04s 4 
D0 rs & wn Or. 6.5 
90 ct, O's 6.5 
95 A buted 7 


Spearman’s foot-rule. Spearman has suggested! a very simple 
formula for finding correlation from ranks which he called the ‘‘foot- 


rule ” formula, 
L 


ar 
where LZ denotes the sum of the positive differences in rank and 


kR=1 (31) 


M = ni 5 LE As its name indicates, this is a very rough estimate of the 


correlation between the variables. When we have a normal distribu- 
tion of the actual measurements, R does not in general approximate r. 
On the assumption of a normal distribution Pearson has shown that 


r=2 cos (1 — R)-1. (32) 


Tetrachoric correlation. The problem of measuring correlation 
sometimes arises in connection with variables which do not admit of 
exact measurement, or only admit of it with very great labor; for ex- 
ample, colors, temperaments, shapes, and so on. Nevertheless, it may 

1C, Spearman, “ A Foot-rule for Measuring Correlation,” Brit. Journal of Psychol- 
ogy, vol. 2 (1906), p. 89; also vol. 3 (1910), p. 271. 
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appear legitimate to assume for purposes of determining correlation, 
that there lie back of our categorical classifications of these variables 
certain quantitative values that are normally distributed. Again we 
may have measurable variables in very broad categories, for example, 
high and low temperatures; tall, medium, and short individuals; grades 
A, B, C, D, and E inschool subjects. If there are only two categories, 
we get a fourfold table of the form 


Such a table would arise in our Boston-New York temperatures 
problem if we simply recorded high or low maximal temperatures. If 
we consider all temperatures above 77.5 as high and other temperatures 
low, our table of page 123 would reduce to 


Boston 
High Low 
# High | 177 44 
4 
Bi Low 31 58 
a 


Under the assumption of a distribution conforming to the normal 
frequency surface, Pearson has shown that the correlation coefficient r 
(called tetrachoric r) is given by a root of the equation 


= ToT'o + v7 sr + ToT or? aL 737 gr -b rat’ grt ++ rt’ sr? -b Tet gro oes 
b+d , _c+d 
n ? 


where to! = ——_, rt) = 
n 
functions, are given to 7, in Tables for Statisticians and Biometricians, 
pp. 42-51. Methods are shown on pagel of these tables for obtain- 
ing values of r,, when m 5 7, if they are needed. 
For the above illustration the equation of degree six in r is 


18710 = .09446 + .12323 r + .01532 r? + .01130 r? + .00961 r4 
+ .00231 r5 + .00665 7°, 


, and the other 7’s, called tetrachoric 


from which we find r = .653. 

If the division between high and low temperatures had been 80.5 de- 
grees, we would find r = .663. If the data were distributed normally, 
the value of r would not be changed by changing the point dividing 
high temperatures from low temperatures. The tetrachoric r for a 


‘The separation into a fourfold table should be such that 1) =.5 and r’9 = .5. 


‘of 
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normal distribution agrees with the correlation coefficient for the case 
of an indefinitely fine grouping of measurable values. 

The probable error ! of tetrachoric r is relatively much larger than the 
probable error of r calculated from measurements — often in the 
neighborhood of three times the probable error of the r calculated from 
measurements. 3 

The Yule coefficients. As a method of measuring association between 
attributes — say between vaccination and recovery from smallpox — 
Yule devised a “ coefficient of association ” ? for a fourfold table which 
is designated by the letter “Q,’’ where 


: (34) 


In comparison with other measures of association, this coefficient seems 
to have no advantages except its simplicity. 
In 1912, Yule published * another coefficient 


Vad — Vbe 
gh Econ oS a 3 
7 Vad + Vbe Sone 


which he termed the “ coefficient of colligation.” 

Yule has expressed his preference for w rather than Q as a measure 
of association. The use of both these coefficients has been the subject 
of vigorous criticism by Pearson and Heron.' 

Method of contingency.’ In this method, the degree of contingency 
between two variables is measured by a function of the difference be- 
tween the numbers actually found in the cells of a correlation table and 
the numbers that would be found if the two variables were independent 
in the probability sense. 

In the notation of page 124, the mean square contingency is defined by 


, ffi 
g=2> aes) : (36) 


the summation to extend to all compartments of the table. 


1 For the method of calculation of the probable error of tetrachoric r, see Tables 
for Statisticians, ete. (1914), pp. xl-xlii; cf. Biometrika, vol. 9 (1913), pp. 22-27. 

2 Phil. Trans. Roy. Soc., Series A, vol. 194 (1900), p. 257; G. Udny Yule, Iniro- 
duction to the Theory of Statistics (1915), p. 38. 

3 Journal of the Royal Statistical Soc., vol. 75 (1911-12), pp. 579-652. 

4“‘Qn the Theories of Association,” Biometrika, vol. 9 (1913), pp. 159-315. 

5 Karl Pearson, “On the Theory of Contingency and its Relation to Association and 
Normal Correlation,” Drapers’ Company Research Memoirs, Biometric Series I (1904). 
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The coefficient of mean square contingency is defined by 


pais 
anv (37) 


If temperatures above 86.5 are called high, those below 74.5 low, and 
the others medium, the Boston-New York table of temperatures, page 
123, becomes : 


Boston 
Low Medium High , tt 
5 Medium 38 207 
ey oe Fon eee, 
Genero 


From this we find ¢? = .4849 and C, = .551. 

With normal distribution and certain restrictions as to grouping, C1 
becomes numerically equal to r. 

The method of contingency is chiefly used in measuring correlation 
between variables not capable of quantitative measurement, though 
there may be many graduations. One objection to the use of C1 is that 
it varies with the grouping in the table. This, however, may be over- 
come by certain corrections.:. When the data are grouped in « rows and 
A columns, and are to be considered as a random sample, the chief cor- 
rection for grouping is the subtraction of 


(esr (Xd) 
n 


from ¢?, C; being then calculated from the corrected ¢?. 

Bi-serial r. We wish sometimes to find the correlation between two 
variables, one of which is measurable while the other is given only in 
alternative categories.2 For example, X may be the grade of a student 
in mathematics while Y is simply “ athletic ” or “ non-athletic’; or 
X may record the temperatures of Boston in many groups while Y, the 
temperature of New York, may be recorded merely as either high or 
low, giving a two-rowed correlation table as shown in the diagram. 


1 Karl Pearson, “‘On the Measurement of the Influence of ‘Broad Categories’ on 
Correlation,” Biometrika, vol. 9 (1913), pp. 116-39 and 216-17. 

2 Karl Pearson, ‘On a New Method of Determining Correlation between a Meas- 
ured Character A, and a Character B of which only the Percentage of Cases wherein 
B exceeds (or falls short of) a given intensity is recorded for each Grade of A,” Bio- 
meirika, vol. 7 (1909-10), pp. 96-105. 
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Taste II. Bi-ser1AL CorreLATION TABLE FOR Boston-New York MaximMan 
Dairy Jury Temperatures ror Years 1911-1920 


x Boston 


61 | 64 | 67 | 70 | 73 | 76 | 79 | 82 | 85 | 88 | 91 | 94 | 97 | 100) 103 


If we assume linear regression of the X on Y and that the distribution 
of the variable Y given in alternative groupings is approximately nor- 
mal, then the correlation coefficient, called bi-serial r, is found as follows: 

Let X, be the mean of the row containing the smaller number, 7,, of 
entries, X the mean of all the X’s, o, the standard deviation of the X’s, 
and n the total number of entries in the table. Bi-serial r is then found 
by dividing ; 

Mm, X,— X 


7b) > Oz 


by the value of ¢ (¢) in the tables on page 209 corresponding to } — Ms 
n 


t 
in the column headed fidwae 
In the above table of temperatures, we find X = 80.762, X, = 
87.279, 9, = 8.4357, n= 310, n,=104, "= .3355, from which we 


8 ag ae bg ~ 
obtain eas = .2592. 
n C, 
t 
From table, p. 209, corresponding to 3 — ws = .1645 in the f (t)dt 
.2592 
column we find $(¢) = .3645, hence bi-serial r is 645 kane 


The probable error of bi-serial r is greater than the probable error for 
the product moment coefficient.2 Under certain distributions it may 
be twice as great. 

Pearson has given a method for finding the correlation when X is 
given in multiple categories and Y in alternative categories, no assump- 
tions being made as to regression or distribution.® 


1H. E. Soper, ‘On the Probable Error of the Bi-serial Expression for the Corre- 


lation Coefficient,” Biometrika, vol. 10 (1914-15), pp. 384-90. 
2 Karl Pearson, “On a New Method of Determining Correlation when one Varia- 
ble is Given by Alternative and the Other by Multiple Categories,” Biometrika, vol. 


7 (1909-10), pp. 248-57. 
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Difficulties in the interpretation of correlation. Although the 
theory of correlation is very useful in the quantitative description and 
characterization of phenomena, many difficulties arise in the inter- 
pretation of the correlation coefficient. For example, the values of two 
variables z and y may be uncorrelated with each other and with a third 


variable z, but the quotients : and ~ or the products xz, yz would 


show decided correlation.! 

Again, two variables may be uncorrelated in each of two records, but 
show correlation in mixed records.2 For example, take one set of 
records consisting of pairs of uncorrelated items with mean values 
(, y), and another set of n pairs of uncorrelated items with mean values 
(x’, y’), where x’ and y’ are respectively greater than z and y. Even with 
the zero correlations in each of the two sets of n, we may nevertheless 
have a considerable positive correlation coefficient r from the 27 pairs 
of values obtained by combining the two sets. 

In some cases part or all of the correlation between two variables 
may be traceable to a third variable whose effect could be eliminated 
by the methods of partial correlation theory. (See Chap. IX.) 

1 Karl Pearson, “On a Form of Spurious Correlation that may Arise when Indices 
are Used in the Measurement of Organs,” Proc. Roy. Soc., vol. 60 (1897), p. 489. 

G. Udny Yule, “On the Interpretation of Correlations between Indices or Ratios,” 
Jour. Royal Stat. Soc., vol. 73 (1910), p. 644. 

? Karl Pearson, Phil. Trans., vol. 192 (1899), p. 257; cf. G. Udny Yule, Introduc- 
tion to the Theory of Statistics (1917), pp. 218-19. 


CHAPTER IX 


PARTIAL AND MULTIPLE CORRELATION 
By TRUMAN L. KELLEY 


GENERALIZED CORRELATION COEFFICIENTS 


Meaning, notation, and formulas. The basic problem of multiple 
correlation is to estimate the value of a variable that corresponds to 
assigned values of two or more other variables. For example, we may 
have assigned values of the maximal daily temperatures of Boston, 
Philadelphia, and Buffalo, and seek from these values the best estimate 
of the temperature of New York City. The use of the linear regression 
equation in two variables for estimating Y when X is assigned is pre- 
sented on page 126. It may be recalled that the mathematical advan- 
tage of linear regression in dealing with two variables is that it leads to 
a simple equation permitting the estimation of one variable knowing 
the other. 

We now seek an appropriate extension of the method of linear regres- 
sion to n variables X,, X2,*:+ X,. To be more precise, let us assume 
that we may estimate Xi from assigned values of Xe, X3, +--+ X, by 
means of the linear function , 


Xi= big... n Xo + bisa. ..n Xa es + ings... 2-1 Xn tC, (1) 


in which the b’s and the c are constants, so chosen that the X;’s computed 
from (1) are the “ best ” (the meaning of “‘ best ” as here used is given 
in Chapter 8, page 126) estimates of the observed X1’s which can be made 
by means of a linear function of the assigned values of X2, X3,+ ++ Xn. 
The ordinary product-moment coefficient of correlation between the X1 
as thus estimated and the observed X; is called the ‘‘ multiple ” correla- 
tion coefficient. 

If measures are expressed as deviations from their own means and 
divided by their own standard deviations, the regression equation (1) 
simplifies. Let 
a= Xi — My 25> X2— Ms etc., 

oO} o2 
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in which the M’s are successive means and the o’s successive standard 
deviations. The regression equation connecting the 2’s is 

@ = Bir.s4.-+n2 + Bis.2.-.n2@Fo°° + Bin.23-.- n—12n3 (2) 
in which the b’s and f’s are connected by the relationships : 


aot Oo} 
bie.ss - OCH eee Bi2.24 eee ea 
: (3) 
13.24++en = Biz.4---n—; ete. 
Also: B coe 


c= M, — bir.sg.. n Me — diz.c4.- +n Mg — +++ — Ding. + n—1 Mn. (4) 


Knowing equation (2) and having relationships (3) and (4), it is but a 
step to secure equation (1), which is the most serviceable form of the 
regression equation for actual use. 

The problem is then the determination of the values of the 8 constants 
in terms of the total correlation coefficients. If 2, is the value of the 
first variable and 2, the estimated value as defined in (2), then (2; — 2;) 
is an error of estimate. Note that 22, = 0, and inspection of (2) shows 
that 22, = 0; therefore (4: — 21) is a deviation from a mean, and the 
standard deviation of such errors of estimate, which we will represent 
by the symbol ki 23... », is the standard error of estimate of 2; variates. 


It is given by 
_ 3a — %)?, 


Ks 03.200 N 
Further, 
ARS ge 2{eves = ov" oe 2 Be Soha, 


so that o123...n, the standard error! of estimate of X, variates, is 
given by 

OG93 oo swiss Tikes eens (5) 
If 7i.23...n 18 the correlation between X, and the linear function of 
Xo, X3,+++Xn» from which we estimate X,, then by parity with 
formula (14) of Chapter 8, page 128 (01.2 = 0:,V1—12,) we have 


O123-6-n = O1V 1 — 73.03.20 0 6 
so that x 
Kissin eV Lice 97 os a onpeOl often... pete dae ke gee (7) 


The reader will note that since r?.o3..., +k?.o3..., = 1 the corre- 
lation and alienation coefficients are related to each other as are the 
sine and cosine of an angle. 


1Cf. p. 128. 
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It only remains to find ki23...,, the multiple alienation coefficient, 
in order to determine 7: 93...,, the multiple correlation coefficient. We 
may write as the function which is to be made a minimum, 

f SN ha oven 
Then we have 


Ff = 2(41—Bi234...n22 — Bis.24..-nZ3—* * *— Bina. ++ n—12n)*s (8) 


Taking partial derivatives with respect to Bios5...n, Bis.24--+ny ete., 
in turn, setting them equal to zero and solving the resulting set of si- 
multaneous equations, yields the values of the §’s which will lead to 
the minimal error of estimate. We may write the answer in convenient 
form, if total correlation coefficients, riz, 713, etc., have the meaning of 
Chapter 8, page 122, if we let A stand for the determinant, 
1 rig 713° * * Tin 
Tio 1 og ++ * Ten 
13 To3 Los ++ T3n 


rem | emt fe : (9) 


e . 


Tin Ton Tan? eo 1 


and if we let Au, Ai, etc., stand for the minors obtained by deleting 
the first row and first column; the first row and second column; etc. 
With this notation the A’s are given by 
Bi2.34 ee we Ay2/An, 
Bisa..+n = — Ap/An, (10) 
B14.235 - +n = Aus/An, 
Bip.3 «spy seen = (— 1)? Aip/An, 
etc. 


The multiple alienation coefficient is (see Pearson, Karl, Biom., v. 8, 
p. 439, eq. VI, 1912) pate INES, (11) 


This alienation coefficient may also be obtained from partial alienation 
coefficients of lower order. Partial alienation coefficients may be de- 
fined as constants related to partial correlation coefficients as are mul- 
tiple and total alienation coefficients related to multiple and total corre- 
lation coefficients. Thus, in general, k with any subscript is related to 


‘r with the same subscript by the equation, 


P+r=1, (12) 
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Then, as first derived by Yule (Proc. Roy. Soc., A, v. 79, 1907), but using 
the present notation : 


Ky1.23 coon = ki2.34 aoe nk13.45 coon? ° * Kan. (13) 
The multiple correlation coefficient is, 
T1.23-.6n = V1 — kigg..en. (14) 


This completes the determination of the basic constants of the regres- 
sion equation as but very simple substitutions are required to get equa- 
tion (1) having equation (2). 

It is sometimes desired to know the “‘ partial ”’ correlation between 
two variables, that is, the correlation between two variables independent 
of one or more additional variables. The symbol for the partial corre- 
lation of X, and Xe is 712.34...» and may be read, “ the correlation 
between X,and X, independent of X3, X4, +++ Xn,” or ‘‘ the correlation 
between X; and X2 for constant values of X3, X4,-++ Xn.’ The partial 
correlation expressed by rie34...2. may be interpreted as an average 
value of the correlations between those values of Xi and X2 which cor- 
respond to different assigned values of the remaining n — 2 variables. 
This partial correlation is given by the first or second of the following 
equalities : 1 

712.34-++n = V Bi12.34 see n Bo1.34 co Ot Se AY Dio.sa . . .nb21.34 sso ne (15) 
The two f’s giving the partial correlation coefficient have been called 
conjugate regression coefficients by Kelley. 

Formulas (10) give the value of any B regression coefficient as the 
quotient of two determinants. A coefficient, 8, of a given order may also 
be evaluated in terms of 8-coefficients of lower order, by the following 
equation : — B12.45 eoon Bis. 45 coon Bs2.45 ORDLGE Sy (16) 


12.84--0n 
Vim Bog ies s 3s n 883.45 secon 


In equation (16) one (called the unique secondary subscript) and only 
one of the secondary subscripts appearing in the B in the left-hand mem- 
ber has disappeared from the secondary subscripts in the §’s in the 
right-hand member. Since all but one of the secondary subscripts 
appear as secondary subscripts in both members the general principle 
may be illustrated by a B of the second order; 


Buea = Bia 4 — Bis. Bs2.4, 
zat 1 — Bos.4 Bs2.4 


1See T. L. Kelley, Statistical Method (1923); and, G. Udny Yule, Introduction to 
the Theory of Statistics (1912). 

?T. L. Kelley, Chart to Facilitate the Calculation of Partial Coefficients of Correla- 
tion and Regression Coefficients (1921). 
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The first primary subscript in the left-hand member term becomes the 
first primary subscript in the first and second f’s in the numerator of 
the right-hand member. The second primary subscript in the left- 
hand member term becomes the second primary subscript in the first 
and third f’s in the numerator. The remaining two primary subscripts 
of the numerator f’s are identical and are the unique secondary sub- 
script. The denominator A’s are the third numerator 8 and its conju- 
gate. 

From these general directions it is obvious that there are as many 
different ways for expressing a regression coefficient of a given order as 
the order of the coefficient. Thus Bi2.34.... may be expressed in n — 2 
ways, equation (16) being one such. In practice it is desirable to cal- 
culate in at least two ways as a check. 

Formula (16) simplifies in case a partial regression coefficient of the 
first order is being calculated : 


Bus = Bie — BisB32 = UE Ute (17) 


1 — Bo3Bs2 Lie rae 
By repeated use of equations (16) and (17) in calculating regression 
coefficients of a given order from hose of an order one less, every re- 
gression coefficient may be obtained. In case determinants are not used 
equations (16) and (17) will serve as the basis in calculating £’s. 

For a three- or four-variable problem one of the two methods here 
given, a method given by Yule,! or a method utilizing an alignment 
chart devised by Kelley ? will prove serviceable, and for problems in- 
volving more than six variables Kelley * gives a greatly abbreviated 
method. The determinantal solution is the most convenient for theo- 
retical work and probably the preferred one for practical work in case 
of a five- or six-variable problem. 

A sample three-variable problem. The steps in the calculations may 
be illustrated from the following data provided by Mr. V. M. Cady. 
The data will serve in the problem of the next section involving partial 
and multiple correlation ratios, as well as in this problem, and thus en- 
able a comparison between correlation coefficient and correlation ratio 
methods. 

The tabulated numbers are the individual observations, thus each 
tabular number refers to one individual. The values of the first variable 


1G. Udny Yule, Introduction to the Theory of Statistics, chap. x11 (1912). : 

27. L. Kelley, Chart to Facilitate the Calculation of Partial Coefficients of Correlation 
and Regression Equations (1921). 

@T. L. Kelley, Statistical Method (1923, pp. 302-08). 
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TABLE OF DATA 


3 
Fa ds Pale Mes cena ed ah MAN nC AT a a 2 a at tt 
3 4 5 6 8 ats 
LE ePabe be aren ve © 7 
8/11 10}10 9); -8 9 
LIS O10 N Oia 7. 
10 AS pall eee ees 
=a 10 9 tebe OPE 6 8 
12 Oni Uieo 
122 bao 
12 
Safe Gr 29 9 
peer ate Ui ews ae 
Ce teO elo Os hO- eo 
&® 2110 34 4 
2% =f 8 6 6 
6 6 
9 3 
6 7 
7 6 
6 7 
4 7 
7 
SaelOe 4 er 8 a2 we 
feo LL ee eke 2 
7 ce Rte 
m=yY By al DG 
0 6] 5 6 
4 2] 6 
3 0 
3 8 


are recorded in the cells of the table. This variable is the rating in 
‘honesty ’’ given by the teachers of certain school children. The second 
variable is categorical, consisting of three groups. The y group comprise 
children rated as the most incorrigible by the school principal or disci- 
plinarian. The 8 group comprise children rated as of average corri- 
gibility, and the « group the most corrigible children. The third va- 
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riable is the score in a test measuring the extent to which the children’s 
judgments of the seriousness of offenses correlate with adult judgments 
of the same items. The distribution of children in the categorical 
trait would probably be tri-modal, if finely graded measures were avail- 
able. In the partial and multiple correlation ratio treatment the 
categorical nature of this variable is no hindrance, but in order to cal- 
culate partial and multiple correlation coefficients we must assign nu- 
merical values to the categories. We will assign the quite arbitrary 
values, «@ = 3; B = 2; y=1. Straightforward calculation yields: 


VARIABLES 
a Z2 Xs 
Cor- 2 1.0000 .6076 — .2064 
rela- Xe .6076 1.0000 — .1715 
tions 2X3 — .2064 — 1715 1.0000 
Means 6.7573 2.0294 5.4191 
o’s 2.9293 .8220 1.3036 
1.0000 .6076 — .2064 
A= .6076 1.0000 — .1715} = .60182. 
— .2064 — .1715 1.0000 
1.0000 — 21715) _ 
An =|_ fiviniatet,0000le oe 
_ tet .6076 — eed és 
Ai. = Ag =|_ 2064 1.0000] > .57220. 
.6076 1.0000} _ 
Ais =|_ SOA ped = 710220. 
1.0000 — .2064| _ 
Ace =| __ 2064 1.0000| > .95740. 


Kis = NES = .78744, Tivos = V1 — F123 = 61639. 
1l 


Biz.s = Av/Au = .58954, Bis2 = — Ais/Au =— .10530, 
Bor.3 = Aoi /Ac. = .59766. 
Ti2.3 = V Bi2.sBo1.3 = -59359. 
21 = Bi2.322 + Bis.2 23 = -58954 22 — .10530 23. 
X, = 2.1008 X2 — .2366 X;3 + 3.7761. 
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Thus, for this problem, we find by referring to the z regression equa- 
tion that the school principal’s ratings should be given six times as much 
weight as the judgment test scores when the two are combined to esti- 
mate honesty scores. However, the nominal weights, due to differences 
in variability, are as 2.1008 is to .2366. The fact that the second co- 
efficient is negative merely means that greater deviation in judgment goes 
with greater dishonesty, and is thus the sign that one would expect. 
The correlation between principal’s ratings and honesty scores, .6076, 
is nearly as high as that between honesty scores and a composite of prin- 
cipal’s ratings and judgment test scores, since this latter is .6164. This 
shows that the judgment test has contributed but little that is not 
already involved in the principal’s ratings. Another evidence of this 
same fact is that the correlation between honesty and principal’s rat- 
ings independent of the judgment test, .5936, is almost as high as the 
total correlation between these two measures, .6076. In the next sec- 
tion the relationships without assuming linear regression will be worked 
out. 


GENERALIZED CORRELATION RATIOS 


Partial and multiple correlation ratios. From Chapter 8, page 128 
we have, in the case of linear regression, the value of ‘ 


O12 = o1V 1 — ri, 


This is an average of the standard deviations of arrays of X,’s around 
the linear regression line. Further, we have 


ola = of — 03, 


in which 9%, is the standard deviation of the means of arrays of Xy’s 
(see page 140) or the standard deviations of the values of X, calculated 
from the regression equation by assigning values to Xo. 


From these two equations, we have 
oxy 
UR Ue Se 
‘ 2 
Thus, if regression is linear, the correlation coefficient is the ratio of 
these two standard deviations. In Chapter 8 this ratio, independ- 
ent of linearity of regression, is defined as the correlation ratio and is 
given the symbol y12. Thus, 
pe ee 
a} 
where %%; is still the standard deviation of the means of arrays, but not 
ordinarily the standard deviation of values of X, calculated from a 


linear regression equation. 
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Dealing with several variables the same sort of a relation maintains 
between the multiple correlation coefficient and the multiple correlation 
ratio. For linear regression we have, 


Ceestanw SU O1V = ies sy 
and also, 
2 


2 
o pi aie = — 
128-++n = 01 — OF, 


in which 2%; is the standard deviation of the x, values estimated by means 
of the regression equation from a knowledge of the 22, a3, « + « 2, values. 


Solving for 7123... 2, we obtain, 
Co-= 


fan EF | 
MOLY 005 kag Se 
ol 


Thus, if regression is linear, the multiple correlation coefficient is the 
ratio of two standard deviations. This ratio maintains significance if 
the regression is non-linear, but we now give it a new name and 
symbol. It is called the multiple correlation ratio. Thus, 


Tz, 
URED OIG) at Peal (18) 


a1 


The calculation of 71.23...» is straightforward though laborious in case 
the number of variables and the number of categories per variable is not 
small. The standard deviation o; is simply the standard deviation of 
the x; measures when included in a single distribution. The magnitude 
Z, is the mean of the 2; measures lying within a single cell of the mul- 
tiple correlation surface. Thus, if there are four variables, 21, 22, x3, 24, 
and if x2 has ten classes or categories; x3, 15; and x, 12; there are, all 
told, 1800(= 10 X 15 X 12) cells, and 1800 different values of %;. The 
standard deviation of these, taking each as many times as there are 
cases in the cell, is the %, desired. From the statement just made it 
is obvious that x2, 23, v4, etc., need not be graded variables. All of the 
independent variables may be strictly categorical, it only being necessary 
that the dependent variable x; be graded in order to calculate a corre- 
lation ratio. It is also obvious that the cell populations, in case of 1800 
cells, would be very small, unless a very large total population is dealt 
with. As a consequence, unless the population is several times the 
number of cells, there is a very large grouping error, tending systemat- 
ically to make the multiple correlation ratio too large. Pearson ! and 
Student? have taken the initial steps in correcting for this error. If 

1Karl Pearson, “On a Correction to be Made to the Correlation Ratio,’ 
Biometrika, vol. 8 (1911), p. 254. 

2Student, ‘‘The Correction to be Made to the Correlation Ratio for Grouping,” 
Biometrika, vol. 9 (1913), p. 316. 
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the grouping is too coarse there is an error of a different sort, but, in 
general, the value of the raw multiple correlation ratio will be more ac- 
curate if grouping is so coarse that no cell frequency is less than 2, than 
if a finer grouping is utilized. 

The concept of a partial correlation ratio follows directly from that 
of a partial correlation coefficient. Formula (13) gives, 


Ki0.g4 0.08 = Kios -- +n 7 

; Kinkin—p.s eee k13.45 oon 

yielding a partial coefficient of alienation of order (n — 2) in terms 
of the multiple coefficient of alienation, and partial coefficients of lesser 
order. A similar relationship holds if non-linear regression is being 
measured ! so that we have, 


1 
V1 = yie.s4..0n = Qa (19) 


(1 ye nin) (1 PF Nin—1)-n) eee ql veil 073.45 see ) 


The same observations as to grouping and as to the categorical nature 
of the variables hold here as in the case of the multiple correlation ratio. 
In the case of multiple partial correlation ratios, just as in the case of 
the total correlation ratios, subscripts before the point are not inter- 
changeable. Thus, in general 


2.3% Nas, but m.23 = y132- 


A sample three-variable problem. We may illustrate formulas (18) 
and (19) using the data given earlier in this chapter. Let it be required 
to find the correlation ratio of estimates of honesty upon disciplinarian’s 
records and score in the judgment tests. We have nine classes in 2, 
and three classes in 22, making a total of 27 cells. Of these, seven have 
zero frequencies and five have frequencies of one each. The grouping in x3 
is not as coarse in the region of the upper and lower ends of the distribu- 
tion as might be desirable, but, taking it as it stands, we have for the 
(v2 = a, %3; = 3) cell a mean 2%, = 9.5 and a cell frequency of 2; for 
the (v2 = «, x; = 4) cell a mean Z, = 9.857 and a cell frequency of 14; 
etc., for the remaining cells. The standard deviation of these means, 
taking each as many times as the population of the cell, is by the usual 
calculation equal to 1.971 and o; as given on page 145 is 2:929. Thus, 
m.23 = .6729. This value may be compared with 71.23, Which was 
found on page 145 to equal .6164. 


‘Karl Pearson, “On the Partial Correlation Ratio,” Proc. Roy. Soc., A, vol. 91 
(1915). 
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Let it be required to find the partial correlation ratio of 2, upon 23 

when 22 is constant. Formula (19) states, 

1 — wise = is 71.23, 
1 — nie 

Since 12 by the calculation of Chapter 8 is found to equal .5900, we ob- 
tain 713.2 = .4009. That is to say that, independent of the bearing or 
relationship of disciplinarian’s records, there is a correlation ratio of 
4 between teachers’ estimates of honesty and test score. This value 
is probably somewhat too large on account of the grouping error. 

When a student first gains a knowledge of the general principles of 
partial and multiple correlation and added power in analysis resulting 
from their use, there is a danger of misuse. Asa general principle, it is 
better to separate experimentally, if possible, the various factors than 
to determine their separate relationships statistically by means of par- 
tial correlation. In many cases the experimental and statistical analy- 
ses can both be used, thus supplementing each other. 

The most subtle danger is that the tyro will interpret partial coeffi- 
cients of correlation as measuring causal relationships, whereas in fact 
the cause of a relationship is no more betrayed by a partial coefficient 
in the case of a number of variables than by a total coefficient of corre- 
lation in the case of two. 

There is less danger in interpreting a multiple coefficient of correla- 
tion as it is readily placed in the same category as the total correlation 
coefficient. The care to be exercised here should be to see that much of 
significance is not lost by treating regression relationships as linear. 
This point may be numerically tested by seeing if the multiple correla- 
tion coefficient is nearly as large as the multiple correlation ratio. If 
several variables are involved, the process would be laborious and the 
probable error of the difference 

Ths + +en — T1283 ---n 
would undoubtedly be large, so that one is thrown back upon the ne- 
cessity of making supplementary investigations and exercising his good 
judgment as to whether linear relationships may reasonably be assumed. 

If relationships are linear, if the accuracy consequent to the size of 
the population dealt with is appreciated, if causal relationships are not 
assumed, and if conclusions are not extended to populations which are 
not homogeneous with the sample, then there is full warrant for use of 
the powerful analytical devices of multiple and partial correlation. 


CHAPTER X 


CORRELATION OF TIME SERIES 
By WARREN M. PERSONS 


Facu item of a time series of statistics is an aggregate or average or 
relative number applying to a definite interval or point of time. Unless 
otherwise specified, the several items of a series are understood (1) to 
refer to equal time units, (2) to be consecutive in time, and (3) to be 
constructed according to a fixed criterion or standard. Illustrations 
of time series are: the population of continental United States at each 
census, the aggregate pig-iron production per month of a representative 
number of blast furnaces, average rates on commercial paper each week 
or month, index numbers of prices on the first of each month. 

The items of time series must be defined for selected units of time — 
the week, the month, the quarter, the year. Because of this fact the 
items are ordered in time and therefore are affected by the same or 
related influences during adjacent time-intervals. In other words, 
each time series is a function of time and, presented graphically, has a 
characteristic conformation. In this respect time series differ from 
other series of statistics, such as the wage rates of different individuals 
or the populations of different countries at a given time. 

Four types of variations are commonly found in the ordered items of 
time series. They are, first, variations which occur within each year 
as a consequence of the round of the seasons by which the items for 
certain weeks or months are regularly higher or lower than those for 
other weeks or months of the year; second, a long-time movement or 
secular trend covering a considerable period of years by which the 
average size of the items makes a permanent gain or suffers a permanent 
loss; third, irregular fluctuations resulting from wars, panics, strikes, 
etc.; and fourth, wave-like or cyclical movements — which may or may 
not be periodic — connected with the ebb and flow of business. 

If our problem is to ascertain the relationship between two series 
ordered in time it is of little avail (or actually misleading) to compute 
the coefficient of correlation from pairs of the actual items. In case the 
two series possess definite trends or seasonal variation the coefficient of 
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correlation for the items will yield a value different from zero. Having 
found such a coefficient we would be unable to say what contributed 
mest largely to the result — similar (or diverse) trends, seasonal varia- 
tions, cyclical movements, or irregular fluctuations. Generally in the 
comparison of two time series it is the relation between their respective 
cyclical variations that is most interesting from the practical point of 
view. In order that this relation may be set forth either graphically 
or by use of the correlation coefficient, it is necessary to remove from the 
actual items that portion of their values ascribable to secular trend and 
seasonal variation. It would be desirable also to eliminate the irregu- 
lar fluctuations, but this appears to be impossible in general because, 
by definition, such fluctuations are unsystematic. Our problem, then, 
resolves itself into two parts, first, the measurement and elimination of 
seasonal variation and secular trend from each series under investiga- 
tion, and, second, the medsurement of the correlation between the two 
series of items thus “‘ corrected.” 


SEASONAL VARIATION 


The problem of measuring seasonal variation is that of isolating from 
a time series a typical movement having a period of one year. In attack- 
ing this problem it is essential to adapt our methods to the material 
which we are using: time series in which seasonal variations, unlike 
those of the physical sciences, do not usually occur with a high degree 
of uniformity. Furthermore, the problem is greatly complicated, espe- 
cially when economic series are our material, both by numerous irregu- 
lar fluctuations and by lack of homogeneity over a long interval of time. 

Suppose that there is given a series of economic statistics, the monthly 
items of which we shall designate by y, and that we seek to measure the 
seasonal variation. The procedure is as follows: ! 

First, for every month, except the first, calculate the link relative, 
which is the ratio of each item of the series to the preceding item, or 


Yi. In this way each January item is expressed as a percentage of 
Yi-1 
the preceding December, each February item as a percentage of the pre- 
ceding January, and so on. 

Second, arrange the January link relatives in a frequency table, the 
February link relatives in another adjacent frequency table, and so 


1 The method is, essentially, the one described in the Review of Economic Statistics, 
January, 1919, pp. 18-31, of the article by W. M. Persons on ‘Indices of Business 
Conditions.” 
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on for the remainder of the months. An illustration of the several 
frequency tables thus secured for the monthly rate on 60-90 day com- 
mercial paper in New York, January, 1890-January, 1917, is given in 
Figure 9. 

Third, find the median of each frequency table.! 

Fourth, express each median link relative as a percentage based on 
January by progressively multiplying or “chaining” the medians. 
Thus, January is taken as 100; the product of this 100 by the February 
median (percentage) gives the February item of the chain series; the 
product of this result by the March median gives the March item; and 
in a similar way we get successively all of the items to December of the 
chain series. If the computed December item is multiplied by the Jan- 
uary median, we shall find that the result will not ordinarily be 100 but 
some other percentage. That is, the process of chaining medians, or 
any averages except the geometric means of a series in which the initial 
and final items are identical,? gives rise to some discrepancy. 

Fifth, distribute the discrepancy among all the items of the chain 
series, using either a geometric or an arithmetic basis for adjustment. 
The object of the adjustment is to make the computed January chain 
relative equal 100. 

Finally, alter proportionally the revised relatives thus secured so that 
their arithmetic mean shall be 100. These final figures are the adjusted 
monthly indexes of seasonal variation. 

Expressed in mathematical symbols the process of finding the ad- 
justed indexes of seasonal variation is as follows: 

Let ri, 72, 73, ** * T12 be the medians of the link relatives (expressed as 
decimals). 

Let 100, ¢2, cs, +++ Ci be the chain relatives obtained by progressive 
multiplication, with January as 100. 

Then, ce = 100 re, C3 = Cols, C4 = C34, °° * Cre = Cul i2. 

In this series of equations 7; and c; do not appear. If we compute a 
value for January (represented by ci) we shall have c; = cir}. The 

1 The average of three or four central items might appropriately be taken instead 
of the median. In case of a very large number of items in the frequency tables hay- 
ing a clearly marked class of concentration, the measurement corresponding to that 
class, the mode, might appropriately be taken as the typical seasonal relative. Eco- 


nomic series are not, however, sufficiently long or homogeneous to make the use of 
the mode practicable. 


; ? The discrepancy for a chain series of geometric means in which y, and y are the 
initial and final actual items (referring to the same calendar month) of the series is 


Vi where 7 is the number of years covered by our monthly series. 
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product csr: will not, in general, give 100 but some discrepancy in 
excess or defect of 100. 

Assuming that the discrepancy may be appropriately distributed by 
applying a constant factor 1+ d to the monthly medians of link rela- 
tives, we have 

100 (1 + d)”®= cyer1, 
from which 1 + d may be computed. 

If, now, we adjust the chain relatives as follows: 

10022 C3 2} do C12 C1291 
"14d (1+ d) (1 + dj!’ (1 + d)”’ 
the last term will be 100 and the discrepancy thus disappears. 

The computation is greatly facilitated by the use of logarithms. A 
scheme for the computation is given in Table I. 


Tasie I. CompuraTiION OF INDEXES OF SEASONAL VARIATION 
(Rates on 60-90 Day Paper, Jan., 1890-Jan., 1917) 1 


LOGARITHM 


INDEX OF 
LOGARITHM OF ApsUSTED 
LoGaRITHM LOGARITHM SEASONAL 
MEDIAN oF | or MEDIAN ences or Ap- soe an VARIATION 
Link LINK Montss with Jan, | JUSTMENT ieee Spiner dinas (AVERAGE 
Rewuatives | Revative TUR) Factor? saan apy Epa Da |) EXO YEAR 
(Loe r) (Loe c) Loe(1+d)t Gs EER (a) = 100) 
(Loe a) (8) 
89 1.9494 Jan 2.0000 0.0000 2.0000 100.0 97.0 
96 1.9822 Feb 1.9822 0.0036 1.9786 95.2 92.4 
105 2.0212 Mar 2.0034 0.0072 1.9962 99.1 96.1 
99 1.9956 Apr 1.9990 0.0109 1.9881 97.3 94.4 
98 1.9912 May 1.9902 0.0140 1.9762 94.7 91.9 
98 1.9912 June 1.9814 0.0181 1.9633 91.9 89.2 
108 2.0334 July 2.0148 0.0217 1,9931 98.4 95.4 
109 2.0374 Aug. 2.0522 0.0253 2.0269 106.4 103.3 
108 2.0334 Sept 2.0856 0.0290 2.0566 113.9 110.5 
102 2.0086 Oct 2.0942 0.0326 2.0616 115.2 111.8 
98 1.9912 Nov. 2.0854 0.0362 2.0492 112.0 108.6 
102 2.0086 Dec 2.0940 0.0398 2.0542 113.3 110.0 
89 1.9494 Jan. 2.0434 0.0434 2.0000 
Arithmetic 
average Cogan none a otelis tate tein wametelcy eras SGo.da% 103.1 100.0 


1 Based on data published in the Review of Economic Statistics, January, 1923, p. 28. 
2 Time (¢) measured in months from January. Log (1 +d) = = (0.0434). To adjust by an arith- 
metic process we would of course apply the equation of condition: 


d’= re (crars — 100). 


There is theoretical preference for geometric distribution of the discrepancy, but the two methods are 
unlikely to yield significantly different results in ordinary practical cases. 


The advantages of using the method just outlined, compared, for 
instance, with the method of arithmetic means of the actual January 
items, February items, etc., are as follows: 
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1. The frequency distributions of link relatives enable one to judge 
the degree or regularity of month-to-month (or seasonal) changes. The 
closer the grouping of relatives for a given month about any value, the 
more pronounced and significant is the seasonal movement for that 
month. 

2. The use of the median (or average of central items) is a device by 
which the influence of extremely large non-seasonal variations (such, 
for instance, as the sudden decline of money rates after the defeat of 
Bryan in November, 1896, or the rapid rise in August, 1914) may be 
greatly moderated. 

3. It is possible to utilize non-homogeneous statistical series in the 
measurement of seasonal variation. For instance, link relatives for 
the bank clearings of 50 representative cities for one interval may be 
combined with link relatives for the clearings of 100 representative 
cities for another interval. Likewise, the method may be used when 
only adjacent pairs of items are comparable. 


Series of economic statistics sometimes occur, covering an interval of 
10 or 15 years, in which the items are strictly homogeneous and not 
affected by large irregular fluctuations. In such cases the method of 
arithmetic or geometric means of all the actual items for corresponding 
months may be used in determining indexes of seasonal variation.? 


If we assume (as does W. L. Hart in “The Method of Monthly Means for 
Determination of a Seasonal Variation,” Jour. Amer. Statis. Ass’n, September, 
1922, pp. 341-349) that monthly series of economic data are described with 
sufficient accuracy by trigonometric functions, such as f(t) = A + B sint (30°) 


+Csint(=), then it may be demonstrated that the method of monthly 
n 


means gives the correct seasonal variation for such functions. Thus, in the 
function given above, where 
¢ = time in months, 
m = an integer (two or greater), 
B sin t (30°) = seasonal variation component, period of 1 year, 


C sin (=) = cyclical variation component, period of n years, the arithmetic 
n 


mean of the values of the function f(¢) for corresponding months (each July, for 


1W. L. Crum, “The Use of the Median in Determining Seasonal Variation,” Jour. 
Amer. Statis. Ass’n, March, 1923. 

2Tf in the median method (previously described) geometric averages of the link 
relatives had been used (instead of medians) and these averages had been multiplied 
together progressively to secure a continuous series of relatives with a fixed base, 
the result would have been identically the same as that secured by expressing the 
original items in terms of a fixed base and then taking the geometric averages of these 
percentages. 
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instance) during a complete cycle (n years) will be A + Bsint (30°). Since 
the arithmetic mean of all values of the series is A, we get the seasonal variation 
components by subtracting the average for all months from the average of each 
of the 12 groups of corresponding months. 

Instead of building up an additive function suppose we construct a product 
function as follows: 


f =A. sin 20°) - cain OF), 


For this function the geometric mean of the values for corresponding months 
during a complete cycle will give the correct seasonal variation. The demonstra- 
tion of the statement just made follows immediately from the theorem cited by 
Mr. Hart.! 


Comparison of the results obtained by the median and the arith- 
metic average methods of securing indexes of seasonal variation for an 
illustrative case — the longest monthly series available, rates on com- 
mercial paper since 1866 — is given in Figure 10.2. Examination of the 
actual data, 1866-1913, showed that there were the following four 
periods with characteristic fluctuations : 


1866-73: Violent fluctuations on a high level 
1874-89: Narrow fluctuations on a lower level 
1890-99: Marked irregular fluctuations 
1900-13: Very few irregularities 


Two sets of seasonal indexes, one based on medians of the link relatives, 
the other on arithmetic averages of the actual items, were computed 
for each of these four periods and, in addition, for the period 1890-1916. 
The five pairs of indexes are presented graphically. It is evident that 
the two sets of indexes for the periods 1874-89 and 1900-13, in which 
there are no important irregular variations, almost coincide; while the 
pairs of indexes for the disturbed periods 1866-73 and 1890-99 differ 
widely. However, if the erratic and clearly non-seasonal fluctuations of 
the panic months of the autumn of 1873 be omitted from the arithmetic 
averages, the resulting indexes for 1866-73 will be very much closer to 
the indexes based upon medians.? Likewise, if the panic year 1893 with 
its wide fluctuations be omitted from the arithmetic averages for 1890- 
99, the resulting indexes will closely approximate the median indexes. 

1 Bécher, Annals of Mathematics, Second Series, vol. 7 (1906), p. 135, Formula 63. 

* The data upon which these computations are based may be found in the Rev. of 
Economic Statis., Jan., 1923, p. 28. The table has been revised slightly since the 
computation of the seasonal indexes here quoted. 

* For instance, exclusion of 1873 from the arithmetic averages results in altering 


the seasonal index for October from 120.4 to 111.7; the index based upon the medians 
for all years is 110.9 
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: 1866-73 
Gold corner, Sept, 24, 1869 
Panic, Sept, 1873: excluding 1873 
from arithmetic averages gives 
values indicated by arrows for 
the panic months Aug.-Dec. 


1874-89 

No marked disturbances, but March 
rates rose abnormally in 1879 
(gold resumption on Jan.1)and 
in 1883(business turned down 
Nov.1882;French crisis, Jan.1883): 
excluding March 1879 and 1883 
gives value indicated by arrow, 


1890-99 
Baring failure, Nov. 15, 1890 
Panic began May 1893:excluding 1893 
gives values indicated by arrows. 
Free silver menace, July-Oct. 1896 
War with Spain, April 1898 


1900-1916 
Panic, Oct. - Nov. 1907 
War declared, July-Aug, 1914 
Federal Reserve System in oper- 
ation since 1913 


1890-1916 
Excluding 1890, 1893, 1896, 1898, 
1907, and 1914 from arithmetie 
averages gives values indica- 
ted by arrows. 


157 


Fic. 10. Comparison or THE INDEXES oF SEASONAL VARIATION FOR RATES ON 
60-90 Day CommerciaL Paper CompuTep For Various Prriops By (A) THE 
Mep1an Mertuop anv (B) THE ARITHMETIC AvERAGE METHOD 

|-|—|-|— Median Method - - - - Arithmetic Average Method 
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Finally, the selective effect of using the median method is clearly 
shown when the period 1890-1916, having both highly disturbed and 
comparatively undisturbed sections, is taken. Omission of the dis- 
turbed years 1890, 1893, 1896, 1898, 1907, and 1914 (with their non- 
seasonal fluctuations) from the arithmetic averages gives results which 
agree (within the limits of accuracy of our data) with those obtained 
by the median method in which all of the data were utilized. The 
seasonal indexes for the period 1890-1916, given in Table II, are based 
upon (A) the arithmetic averages for 1890-1916, (B) the arithmetic 
averages for 1890-1916 excluding 6 disturbed years, and (C) the median 
link relative method for 1890-1916. The absolute sum of the differ- 
ences between corresponding indexes A and B is 20.9, and that between 


TaBLe II. InpExEs oF SEASONAL VARIATION ! 


Mernop Jan. | Fes. |Mar.| Apr.| May |June| Juny| Aue. | Sepr.} Ocr. | Nov. 


Arithmetic A?2| 96.6 | 90.7 | 97.4 | 95.0 | 91.2 | 90.7 | 97.2 | 106.3 | 111.7 | 111.5 | 106.5 | 105.6 
average B2| 98.2 | 92.4 | 97.7 | 95.0 | 91.1 | 90.2 | 95.4 | 101.8 | 108.5 | 111.0 | 107.8 | 111.0 


Median. . . C3] 98.8 | 92.1 | 95.9 | 94.1 | 91.5 | 88.9 | 95.2 | 102.9 | 110.2 | 111.5 | 108.4 | 110.7 


1 Computed from data published in Indices of General Business Conditions, by W. M. Persons, pp. 98- 
9. 


2 Indexes “‘A’’ are based upon the 27 years 1890-1916 ; indexes ‘‘B”’ are based upon the same period 
excluding the disturbed years 1890, 1893, 1896, 1898, 1907, 1914. 

3 Indexes ‘‘C’’ differ somewhat from those given in the last column of Table II because the latter 
are based upon the revised data published in the Rev. of Economic Statis., Jan., 1923, p. 28. 


indexes B and Cis only 9.7. It is clear, therefore, that the non-seasonal 
movements of disturbed years distort the seasonal indexes obtained by 
the method of arithmetic averages. Moreover, it may be pointed out, 
the extremely high or low items of an economic series are precisely the 
ones concerning which the accuracy of the data is in greatest doubt. 


SECULAR TREND! 


The problem of measurement of the secular trend of a time series is 
that of fitting a straight line or curve to the graph obtained by plotting 
times as abscissas and the items as ordinates. In computing the trend 
for a monthly series extending over a number of years it is not necessary 
to work with monthly items; a series of annual averages may be sub- 
stituted and much unnecessary computation avoided. Usually we 
may assume that the trend is a straight line; in other words, that if the 
trend were the only source of variation in the series, the actual change 
in value between two consecutive months would be strictly constant. 


1 For a discussion of the general problem of curve fitting see Chapter IV. 
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In such a case the particular straight line which represents the trend is 
the one which best “ fits’? the data, the criterion of fit being that the 
sum of the squares of the deviations ! of the points corresponding to the 
data from the line shall be a minimum. 

To determine this straight line for a series of annual items y1, yo, ys, 
»++Yn, proceed as follows: Measure time (t) from the central item of 
the series, or, if the number of items is even, from the point midway 
between the pair of central items. For an odd number of annual items 
the abscissas would be---, — 3, — 2, — 1,0, +1, +2,+3,--+,and 
the unit one year; for an even number of items the origin would be 
midway between the central pair and, to avoid fractions in the compu- 
tation, the abscissas would be---, — 5, — 3, —1, +1, +3,+5,+>-, 
and the unit one-half year. If the origin for time be so chosen, the an- 
nual increment of the line of trend will be a ; and the y-intercept will 


ra 


be =u, or the arithmetic mean of the items.? The slope of the line of 


secular trend can easily be found, therefore, by the following computa- 
tions: First, multiply the original items by a series of integers denoting 
units of time, half of which are positive in sign and half negative; sec- 
ond, find the algebraic sum of the products thus secured ; third, find the 
sum of the squares of the series of integers; and, fourth, take the ratio 
of the first of these sums to the second. 

The object of finding a line or curve of secular trend is primarily to 
secure a function describing the general movement of a time series for 
a completed historical interval, covering, if possible, several business 
cycles. The function which best fits the data of a completed historical 
interval ending with the present, however, is not necessarily the best 
function for estimating the future trend. For estimating future trends 


1 The deviations mentioned are measured parallel to the axis of ordinates. 


24 = annual average of monthly items. 
’ ¢ = time in years measured from the center of the period covered. 
nm = number of years in period. 


=Y = ordinate of trend for center of period. 
1 zyt . monthly increment of trend. 


zy = en te = the ordinates of secular trend for the first and last months 
n 
respectively of the period. 


When the number of years is even, substitute 5 for t in the above formulas. 
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both the type of function and the interval should be selected with ref- 
erence to that specific problem. 


ELIMINATION OF SEASONAL VARIATION AND SECULAR 
TREND 


Assuming that we have measured satisfactorily both seasonal varia- 
tion and secular trend, the next problem is to “ correct ”’ for these 
factors the original items of the time series. Let the original items be 
represented by y, corresponding indexes of seasonal variation by s, and 
corresponding ordinates of secular trend (in terms of the original unit 
of tons, dollars, etc.) by o. If there were no irregular or cyclical varia- 
tions affecting the series, each item would be represented by so, which 
we may appropriately designate the “normal” value.! Since the origi- 
nal item is represented by y, the expression y — so is the deviation 
from ‘estimated normal,” and fee! or - — 1 is the relative devia- 
tion from estimated normal. This is the formula according to which we 
compute our “ corrected ”’ series.” 

If it be desired to correct the original items for seasonal variation 


alone, the form : may be used, the result being expressed in terms of 


the units of the original series. 


COMPARISON AND CORRELATION OF CORRECTED SERIES 


It is now possible to compare the cyclical variations of two or more 
“‘ corrected ”’ series, and the chief means available is the graphic method. 
If we are interested in comparing the relative violence of fluctuations 
of the various series, the corrected items (as described above) of each 
series should be plotted on translucent. paper with time measured in 
months as abscissas. By superimposing one chart on another (their 
horizontal axes coinciding) over a glass plate illuminated from beneath, 
the similarities or differences between the two can be estimated. A 
comparison of these corrected series is given in Figure 11. 

In case we are not interested in the relative amplitudes of fluctua- 

? This is based on the assumption that seasonal variation increases proportionally 
with the ordinate of secular trend. 

* In case we conceive the “normal” to be the ordinate of secular trend, the formula 


80. In actual practice, where the seasonal indexes range within 10 


becomes ¥— 82 

4) 
or 15 per cent of the average, the results obtained by the two formulas would not be 
significantly different. 
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tions, but merely in the timing or Jag of one series with respect to an- 
cther, or the relative shapes of the curves, the various corrected series 
should be expressed in units selected with the object of making the 
cyclical fluctuations of like magnitude. An appropriate unit for this 
purpose is the standard deviation of the corrected items. Therefore, 
before comparing the curves for lag or relative shape, the items of each 
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Fig. 11. Comparison oF THREE Series or Economic StTatistics CORRECTED 
FOR SEASONAL INFLUENCES AND Lona-Timm TREND 


A, Index of the Prices of Twenty Industrial Common Stocks, 
B. Ten Commodity Price Index. 
Cc. Rates on Commercial Paper. 


(Ezpressed as percentages of the ordinates of linear secular trend.) 
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corrected series are divided by the standard deviation of that series. 
The items thus expressed are called cycles. A comparison of three 
series of cycles is given in Figure 12. 

The amount of lag for the highest degree of correlation between two 
series, or best “ fit’ when one curve is superimposed upon another, can 
be determined only approximately by inspection. In order to get nu- 
merical measures of the relative goodness of fit for various lags we must 
resort to the coefficient of correlation (see Chapter VIII). The process 
is as follows: First, compute the coefficient of correlation for the pairs 
of items which, according to inspection of the charts, appear to be 
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most highly correlated; and, second, compute other coefficients for 
lags of both greater und less amount. When some pairing gives a 
higher coefficient than adjacent pairings, the degree of lag for maxi- 
mum correlation is indicated! Thus, the coefficients of correlation 
between the cycles of pig iron production and interest rates, monthly, 
1903-14, with a lag of interest rates of 0, 3, 4, 5, 6, 7, 8, 9, and 12 
months are, respectively: + .34, + .67, + .72, + .75, + .75, + .73, 
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Fic. 12. Comparison or THREE SERIES OF Economic Stratistics CORRECTED 
FOR SEASONAL INFLUENCES AND LoNG-TiME TREND 


A. Index of the Prices of Twenty Industrial Common Stocks. 
B. Ten Commodity Price Index. 
C. Rates on Commercial Paper. 


(The percentage deviations of Figure 11 are expressed in terms of their respective standard deviations as 
units.) 


+ .70, + .65, and + .45. These coefficients indicate that the maxi- 
mum correlation is for a lag of 5 or 6 months in interest rates, a fact 
which is shown graphically by plotting the coefficients as ordinates and 
time as abscissas (Figure 13). 

The “ probable errors ”’ of coefficients of correlation (or other constants) 
computed from time series of economic statistics do not have the usual 
meaning. The theory of probability does not apply to our data because, 


1 Tf in each of two series (x and y) we take deviations from the straight line of sec- 
ular trend, the sums of such deviations will be zero, or Z(z — 02) = 0, and =(y — oy) 
=o. If in addition these deviations are measured in terms of their respective stand- 
ard deviations and we denote the resulting ratios by x’ and y’, the coefficient of cor- 

ae 


relation becomes In other words, the coefficient of correlation is the arith- 


metic mean of products of corresponding deviations (when measured in terms of the 
standard deviation) from the line of secular trend. 

For percentage deviations (instead of actual deviations) from the line of secular 
trend, corrected for seasonal variation, and expressed in terms of their standard de- 
viations (denoted by z’’ and y’’), the coefficient of correlation is an extremely close 


approximation to Pae"ty!" 
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first, any past period that we select for study is, in fact, a special period 
with characteristics distinguishing it from other periods, and so cannot be 
considered a ‘‘random”’ selection; second, the individual items of the 
series are not chosen independently, but they constitute a group of suc- 
cessive items with a characteristic conformation. Consequently, the 
“probable error” of 0.03 in the coefficient of correlation quoted above 
does not indicate, as one would conclude from the theory of probability, 
that if we compute a coefficient from “any” other actual period, the 


1,0 


, Coefficient of Correlation , 


Lag in months 


Fic. 13. Corrrictmnts or CorRRELATION BETWEEN ‘“‘CyciEes’’ oF Pia IRON 
Propuction AND “Cyciys’”’ oF INTEREST Rates on 60-90 Day ComMMERCIAL 
Parer, For Various Decress or Lac In IntTeREsT Rates 


chances are equal that it will be between + 0.72 and + 0.78. In fact, 
the significance of the probable error of a constant computed from time 
series is not known. 

Coefficients of correlation, although useful in determining lag of time 
series, are after all merely averages. The specific relationships between 
two time series are much more adequately set forth by charts than by 
numerical measures. Also, there is great danger that coefficients based 
on time series may be wrongly interpreted. For instance, a high 
coefficient may result if two series fit their secular trends badly and the 
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badness of fit in the two cases is similar.1 The precise nature of the 
correlation is often evident from charts when it is not revealed by 


coefficients. 


It is possible, if we are willing to dispense with the graphical comparison of 
the cycle charts, to eliminate the secular trends of two series and find the 
coefficient of correlation all in one step by the use of the methods of partial 
correlation. 

Suppose, in fact, that the seasonal variations have already been eliminated 
and that the resulting series are x’ and y’. We consider the problem as one in- 
volving three variables, x’, y’, and ¢; and the desired correlation between the 
x’ and y’, corrected for secular trend, will be equal to 


a Toty! — Tr'tTy't 
LEO a a ey ere ay, 
V1 —12,V1 — Tire 


which is the partial correlation coefficient of x’ on y’ independent of f. More- 
over, this value is not altered by the existence or amount of lag: it is at once the 
desired maximum correlation. It should be remembered, however, that this 
coefficient deals with deviations rather than the percentage deviations involved 
in the cycles; but it can be shown that this fact has little bearing on theresults.? 


Coefficients of correlation between first differences of series of cor- 
rected monthly items or annual figures are valuable when we desire to 
measure the similarity or dissimilarity of month-to-month or year-to- 
year changes, for instance, in investigating the year-to-year movements 
of the prices and production of crops. The correlation of the second 
and higher differences, the ‘ variate difference method,” * has been 
proposed as a means of ascertaining the correlation between time series, 
but the method is based upon assumptions which cannot be retained 


1 Such is the result for most economic series covering both the period of declining 
prices previous to 1897 and the period of rising prices following that year. Nearly 
all economic series dip below the linear trend in the nineties so that a correlation co- 
efficient between their deviations would indicate that fact rather than the general 
correspondence of their fluctuations. (See W. M. Persons, “Construction of a Busi- 
ness Barometer,” Amer. Economic Rev. (Dec., 1916), p. 755, and H. L. Moore, Eco- 
nomic Cycles: their Law and Cause, p. 123.) 

2W. L. Crum, “A Special Application of Partial Correlation,” Quar. Pub. Am. 
Stat. Assoc. (December, 1921), pp. 949-52. 

3 See W. M. Persons, “On the Variate Difference Correlation Method and Curve- 
Fitting,” Quar. Pub. Amer. Statis. Assoc., June, 1917; and G. U. Yule, “The Prob- 
Jem of Time Correlation, with especial reference to the Variate Difference Corre- 
lation Method,” Jour. Roy. Statis. Soc., July, 1921. In order to be sound the method 
would have to be altered to allow for the correlation between the original items of 
a series. Since this was written an article by Karl Pearson and E. M. Elderton 
: eee Variate Difference Method” has appeared in Biometrika for March, 1923, 
ss a Kcciannesath character of the variate difference method, as originally pro- 


CORRELATION OF TIME SERIES 165 


even in the simplest problems, and analyses have shown it to be of 
little or no value for our purpose. 

We may summarize our conclusions as follows: The most satisfac- 
tory method of setting forth the relationships between time series of 
economic statistics is, first, to compute and compare their respective 
indexes of seasonal variation; second, to compute and compare their 
lines or curves of secular trend ; and, third, to correct the original items 
for seasonal variation and secular trend, and compare the resulting 
graphs. Coefficients of correlation for the various possible pairs of 
items of time series are useful mainly as a basis for judging the lag 
of one series with respect to another. 


CHAPTER XI 


PERIODOGRAM ANALYSIS! 
By W. L. CRUM 


HARMONIC ANALYSIS FOR KNOWN PERIODS 


The nature of periodicity. The study of a statistical series ordered in 
time includes, in many cases, an investigation of the existence and ex- 
tent of periodic variation. In many problems of natural science the 
periodic portion of the fluctuation is fairly prominent, and the examina- 
tion of periodicity is direct and simple; but in the problems of social 
science, and indeed in many of the more complex of the problems of 
natural science, the determination of periodicity is considerably more 
difficult. For these less obvious cases the methods of Fourier Analysis,? 
supplemented by Schuster’s Periodogram,’ are coming into wide use. 
We present in this chapter a summary description of these methods, 
and certain critical remarks on their use and applicability. 

We consider first, to bring out the basic properties of a periodic 
function, the elementary periodic expression 


y=ata sin(At + «), (1) 


in which p is the period, « is the phase, a is the amplitude, and ap is 
the mean value in any period. It is clear that y has its mean value, do, 
when 


Bag ody ie 
Seat ae 


where k is an integer; and that a is the maximum deviation of y from ao, 
and occurs when 


Ue (ney 


1The writer of this chapter is indebted to Professor Allyn A. Young for many 
valuable suggestions. 
2 W. E. Byerly, Fourier’s Series (Ginn, 1893), chap. 11-Iv. 
°D. Brunt, The Combination of Observations (Cambridge University Press, 1917), 
chap. x11. 
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The graph of the curveisshownin Figure 14. The well-known geometrical 
construction of y as the projection, on a fixed line OY, of a point A, re- 
volving with constant angular velocity about a fixed point Ao, is repre- 


Fia. 14 Fia. 15 


sented in Figure 15. These two figures exhibit the essential features of 
the simple periodic function upon which our analysis is based. 
A familiar trigonometric transformation converts (1) into 


y = do + a3 cos “Tt + by sin 281, (2) 
where , 
a, = asin a, b} = acose, (2 a) 
from which 
a= Va; + b?, a = tan-! os (3) 
1 


For problems in which the period is known in advance, the method con- 
sists in determining the coefficients of (2), and then calculating the am- 
plitude and phase with the help of (8). The four elements, p, ao, a, «, 
are then known, and the periodic term is completely determined. 

The analysis for a single known period. We suppose that we are con- 
cerned with a simple physical problem possessing a known period, p. 
If there are given three pairs of values of ¢ and y, provided only that the 
values of ¢ are all different, substitution will give three algebraic equa- 
tions from which to get do, a1, b1. Then we can calculate the phase and 
amplitude by use of (3), and the problem is completely solved. 

In general, however, we should have data for more than three pairs 
of values; and it is then impossible to make the function fit all the ob- 
servations exactly. In such a case, the coefficients, do, ai, 61, are de- 
termined by the method of least squares, in order that the function may 
fit all the data as nearly as possible. 

Let it be assumed that there are n observations covering a single 
period, and that these observations are equally spaced in time at inter- 


168 HANDBOOK OF MATHEMATICAL STATISTICS 


vals of p/n. Suchaseries is shown in TableI. Substitution in (2) gives 
n equations 
Yi =O +a cos 2m = + by sin 2a *; i=0,1,2,---n—-1, (4) 


from which to determine Qo, a1, b;. If n exceeds 3, it is unlikely that all 
the equations (4) will be consistent; and the coefficients must then be 
found by least squares. 

To this end we construct the normal equations by the usual method,} 
and have 


a = 1S. a = aS yi cos 2 + t bh = Sy: sin 2 7 kas (5) 
™M=0 M20 n MH=0 n 


By the use of (5), one can calculate at once the coefficients of the func- 
tion that fits the given series; and such a computation, for the data of 
Table I, is shown in Table II. 

Simplified calculation. In practice the computation can be reduced 
considerably, because of certain symmetric properties of the sine and 
cosine. In general these schemes postpone the multiplication of the 
y; by the various sine and cosine factors until certain necessary combina- 
tions have been made among the y; themselves. The fundamental 
method, as developed by Runge,? applies if m is a multiple of 12; anda 
full description of this scheme is presented by Carse and Shearer.2 An 
adaptation of the method to cases in which n is a multiple of 6 is given 
by Running,‘ and a scheme for 7 a multiple of 4 is to be found in Brunt’s 
book. 

It should be remarked that, although these arithmetic schemes usu- 
ally give equations for calculating the coefficients of all the harmonics 
as well as the terms of fundamental period, they are available for the 
limited case of three coefficients which we have in view. The point is, 
as we shall notice later, that each coefficient of a Fourier’s series is inde- 
pendent of all the others: one can stop the calculation at any stage with- 
out needing to alter the coefficients already determined because of the 
omission of those not determined. 

For other arithmetical methods of calculation and for certain graphical 
methods, reference is made to Carse and Shearer *; and, for the use of 


1 Brunt, op. cit., pp. 77 seq. and 172 seq. 

2 Zeitsch. fiir Math. und Phys. (1903), p. 448; and (1905), p. 117. 

’G, A. Carse and G. Shearer, A Course in Fourier’s Analysis and Periodogram 
Analysis (London, Bell, 1915), chap. m1. 

4T. R. Running, Empirical Formulas (Wiley, 1917), chap. v. 

5 Brunt, op. cit., pp. 181 seq. 

6 Carse and Shearer, op. cit., chap. 11. 
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harmonic analyzers, to the Handbook: of the Napier Tercentenary Cele- 
bration. 

We show, in Table III, the abridged calculation by the method de- 
scribed by Running, for the series of Table I. 

The analysis for multiple periods. Suppose next that y has more 
than one period, and that all the periods are sub-multiples of a single 
fundamental period. For the criticism and theoretical justification of 
the methods to be used, the reader is referred to a work on function 
theory? and to a recent report by Jackson*; but here we shall assume 
that y can be expressed by 


fim Gye dy COR oe bl Os cos 2 OE} ss a, coarse nt 
P Pp Pp 
. Qr ; 24 . 2r 
+ b; sin —t+ bo sin2=-t+---+b,sinr*t 
pet Pp P 
: 20 Aes 
= “-t+b — it). 
do + (e008 9 D + b,sin q P ) (6) 
If there are n observations and n = 2 r + 1, the coefficients, ao, aq, ba, 


can be determined exactly. If n exceeds 2 r + 1, we resort to the method 
of least squares and find for the normal equations 


1= 
My = dys 
‘= 
2 4 we z 
Lae i -, b.== i 27 -- 7 
as nate cos q 2m —, ; nay! sin g 2 — (7) 
g=1,2,°++°r. 


These formulas lend themselves at once to calculation, and the vari- 
ous special computation schemes mentioned above are available. As 
already noted, the values of the various a, and 6, are independent of 
each other; and the series may be broken off at any point. It can be 
shown,‘ however, that the larger we take r, up to the point where 
n=2r-+ 1, the better will be the fit to the actual data. 

The case for arbitrary functions. If the number of known values of 
y becomes infinite, which happens when we seek to express a known func- 
tion by a Fourier’s series, we may let r in (6) become infinite; and (6) 


1 Modern Instruments and Methods of Calculation (London, Bell, 1914). 

2 J, Pierpont, Theory of Functions of Real Variables, II (Ginn, 1912), chap. xu. 
3 Bull. Amer. Math. Soc., vol. 27 (1921), p. 415. 

4 Brunt, op. cit., p. 177. 
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then takes the general form of Fourier’s series. Introduce in (7) a new 
variable x 
t 


r= po dx=P 


and let n become infinite. The normal equations take the form 
ay 4 PEs es 27 AM VAs Nast wordy 
a = = dz, Ag== cos gq—a2dz, b “ah ysing—adr (8) 
4 ab ori af Bi aT Ses arr he p 


We use (8) to calculate the coefficients of a Fourier’s series to fit any 
arbitrary function. A graphical illustration of the approximations 
to the function y = c, where c is a “onstant, is shown in Figure 16 for 
r = 1, 2, 3, 4. 


t 
Fic. 16 


T=2 


THE PERIODOGRAM 


Introductory discussion of the periodogram. If, instead of n observa- 
tions at equal intervals p/n, there are kn observations extending over k 
periods, an obvious modification is made in the process. The values of 
y; are arranged in rows, each containing n successive values of y;, and 
each of the n columns is summed and the results are divided by k. These 
average values of y; can then be treated by the method outlined above. 

The method just described can be used if the statistical series is truly 
periodic and if the period is known. The fact is, however, that the 
“‘ period ” of most statistical time series is not exactly constant; and 
the direct use of the Fourier analysis may lead to faulty interpretations. 
Moreover, the length of the period may not be obvious from the series 
or its diagram, and it becomes necessary to devise a plan for finding 
periods. Finally, a Fourier’s series, consisting of a term of fundamental 
period and its various harmonics, does not properly represent such series 
as exhibit the joint effect of several periodic terms for which periods are 
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not commensurable. To attack these problems, we rely mainly upon 
the periodogram devised by Schuster.! 

The validity of the periodogram analysis rests upon the fact that the 
summation method used tends to intensify the variation having a par- 
ticular period, and to smooth out the other variations. To show the 
way this takes place we use an illustration adapted from Carse and 
Shearer. Suppose we have, for successive instants of time, a series 
consisting of the sums of two periodic components : 

0,2, 4,6, 7, 7, 6, 4, 2, 0, 2, 4, °° 

0,1, 2,3, 4, 3, 2; 1, 0, 1, 2, 3, rate 
giving 

0; 3,6, 9-41, 107%, 6, 2, 1, 4; 7,10, 10, 9; 7,4 + 
If we arrange the first 63 terms of the resultant series in 7 rows of 9 
terms each and add the columns, we get 


15, 30, 47, 56, 62, 61, 55, 42, 29. 
These values are given approximately by 


(VE tee eset +4); i = 0,2, 4, 6, 7, 7, 6, 4, 2, (9) 
and it appears therefore that the summation has served to average out 
the shorter period component and to multiply the effect of the longer 
period component by 7, the number of rows. This is due to the fact 
that the longer period component exactly completes a period in each row ; 
whereas the shorter period component, which does not have an integral 
number of periods per row, suffers an advance in phase from row to 
row. It happens in this illustration that the two components are again 
in phase at the beginning of the ninth row; and, had we taken eight (or 
any multiple of eight) rows in computing the sums, we should have 
found that a rule such as (9) would fit the results exactly. For a 
number of rows not a multiple of eight the rule does not fit exactly. 
For instance, for eleven rows, the sums are 
19, 44, 73, 92, 102, 99, 85, 62, 41 


and the rule gives 
22, 44, 66, 88, 99, 99, 88, 66, 44. 


Had we taken nineteen rows, instead of eleven, the deviations from the 
rule would have been found numerically the same; but they would 


1 Terrestrial Magnetism (1898), pp. 13 seq. 
Proc. Roy. Soc., A, vol. 77 (London, 1906), p. 136. 
2 Carse and Shearer, op. cit., p. 30. 
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be less important relatively. The really significant fact is that long 
series of data must be used in periodogram work, if the single period 
under investigation is to be isolated. 

If the rows had been taken with ten terms each, instead of nine, it is 
obvious that the summing process would have tended to smooth out the 
long-period component also. The essence of the periodogram method is 
to take that trial period, the length of a row, which will bring out most 
prominently the effect of the period being sought. 

Changes effected by summing. Suppose the periodic component 
studied is 

: T 
a sin( ; t+ «) 


and that we arrange our observations in n rows of m items each. The 


result of summing can be shown ! to be 
Site ; 
Cin Tica meee: (10) 


- Mm 
eine 


We observe that the phase has been increased by (n — 1) ™, and the 
; a P 
amplitude has been multiplied by z, where 


sin nrx 
24> 


= ct=m 
_ 7 Sim awe ; /P 


The maximum value of z is n, and occurs for z = 1: that is, form = p. 

The value of z falls off rapidly on each side of z = 1, and becomes 0 for 

x =1+1/n: that is, for m = po yy 
n 


For values of « beyond these 


points there are several maxima and minima of the z curve; but they 
are all relatively unimpor- 
tant, as may be seen from 
Figure 17. It is then evident 
that, if we choose the right 
trial period, the summing 
process will yield a series of 
sums possessing a periodic 
variation of large amplitude ; 
Fig. 17 and, if our trial period devi- 
ates from the true period, 

the resulting amplitude will be much smaller. | 


r4 


1 Carse and Shearer, op. cit., p. 32. 
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Determination of a period. The process indicated by our discussion 
consists in selecting a trial period, m, arranging the data in rows of m 
items each, summing the columns, and computing the amplitude of 
the resultant series of sums by the Fourier method. The square of half 
this amplitude, multiplied by the total time interval covered by the 
statistics, is defined by Schuster ! as the ordinate of the periodogram for 
the trial period m. Then another trial period, m’, is taken, and an 
ordinate calculated for it. Schuster? has shown that two trial periods 
may be separated by an interval such that 


where m is approximately equal to m and m’, and N is the total number 
of periods m in the whole time interval under investigation. It should 
be noted further, in order to agree with Schuster’s analysis, that we as- 
sume here the time interval between two successive observations is unity. 

When the ordinates have been calculated for a succession of trial 
periods m, we can plot them against the values of m as abscissas, and 
have then the periodogram. Values of m which yield large ordinates on 
this curve are in the vicinity of true periods, and they may be estimated 
from the curve itself. Owing, however, to the bluntness and lack of 
precision in the maxima of the periodogram, a careful determination of 
the period cannot be made from the curve. We use instead a scheme 
which is based upon the advance in phase suffered by a periodic term 
when the trial period differs from the true period, an advance noted in 
equation (10). If we have found a trial period m, which appears from 
the periodogram to be near a true period, we divide the n rows belong- 
ing to this trial period into groups of k successive rows each. It is then 
possible to calculate the average change in phase, 8, from one group 
to the next, and the true period is given by ® 

p= daatidbdibamst 

1 B 


The “ oscillation? method. In order to save the labor of carrying 
through the Fourier analysis to get the amplitude for the sums for each 
trial period, an approximate value may be taken as one half the dif- 
ference between the maximum and minimum of the averages obtained 
by dividing these sums by the number of rows. For a rough analysis, 

1 Proc. Roy. Soc., A, vol. 77 (1906), p. 138. 


2 Phil. Trans. Roy. Soc., A, vol. 206 (1906), p. 71. 
3 Brunt, op. cit., p. 197. 
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these approximate values may be plotted at once as ordinates of the 
periodogram. Yule! has found that this method frequently leads to 
serious errors, and it must therefore be used with caution. The real 
difficulty seems to be due partly to the inherent dangers in using the 
method with statistical series which are too short or subject to frequent 
extreme deviations. 

Other methods of finding the periodogram are discussed in the texts 
of Carse and Shearer, and of Brunt. 


APPLICATIONS 


Periodogram analysis of commercial paper rate. As an illustration 
of the application of the preliminary periodogram analysis, we shall 
examine the monthly record of the rate of interest on 60-90 day com- 
mercial paper in New York City from 1899 to 1914, as given in Table 
IV. We select this as an example because, although it is not very 
well fitted for periodogram analysis, the increasing use of the periodo- 
gram method in economic statistics renders an illustration from eco- 
nomic material particularly useful. A rough survey of the data indicates 
that, in addition to the period of twelve months corresponding to the 
marked seasonal variation, there is an approximate period of about 40 
months. The portion of the periodogram in the vicinity of the 40-month 
trial period is shown in Figure 18, and the supporting data are given 
in Table VI. The detail calculations for the 43-month trial period are 
presented in Table V. 

It is apparent that 43 months gives the greatest intensity and, on 
the basis of the preliminary study, that would be accepted as the period. 
It should be observed, however, that there are several ordinates in the 
vicinity of 43 which are nearly as large as that at 48. There are two chief 
reasons for this lack of precision in the periodogram. One is that the 
series is unduly short: only four trial periods as shown by Table V. It 
may be said at once that the use of the periodogram method on so short 
a series is quite unjustifiable except to give the approximate value of 
the period in order to assist in describing the fluctuations. The use of a 
period calculated from so short a series, to fit a periodic curve to the data, 
to substantiate a causal hypothesis, or to forecast the future, is extremely 
hazardous. Moreover, it is evident that the refinement of the result 
of the preliminary calculation, by the method outlined in Brunt (p. 197), 
is quite impossible in such an example. Likewise, it is not possible to 
break the series into halves and determine whether the same period 


1 Jour. Roy. Stat. Soc., vol. 84 (1921), p. 525. 
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holds in both halves. Indeed, it cannot be too strongly emphasized that 
the full application of the periodogram method with its proper tests re- 
quires that the series under investigation be much longer than that of 
Table IV. 

A second cause of the blurred maximum in the periodogram in the 
vicinity of 43 months is found in the nature of economic fluctuation. 
A historical economic series is characterized in general by frequent 
extreme and irregular deviations. The periodogram method does not 
effectively average out these extreme items; and, as they quite fre- 
quently occur in or near a particular phase of the cycle, they have a 
marked effect on the appearance of the periodogram. Perhaps the only 
sure way to discover the part played by these extreme items is to plot 
the entire record, using the times as abscissas and the data as ordinates. 
Moreover, the strong seasonal movement in the series under investiga- 
tion probably contributes to the confusion of the periodogram. If the 
series were much longer, this influence would doubtless be less impor- 
tant; but even then it can be shown that trial periods which are com- 
mensurable with the seasonal period will yield results in which the 
influence of the seasonal movement enters in an unknown degree. The 
best that can be said is that the periodogram analysis of an economic 
record must be supplemented by a most searching criticism in the light 
of the actual data. 

Critical remarks on the periodogram method. We have stated sev- 
eral times that it is important that the original data cover a long time 
interval. It cannot be insisted too strongly that this method of anal- 
ysis is not valid for a short statistical series. On the other hand, in 
many phenomena the period undergoes a change in length with the 
lapse of time ; and this will result in misleading inferences, if the periodo- 
gram is used blindly. An instance of the extreme care with which 
the method must be applied will be found in Schuster’s Sunspot paper,} 
in which he finds a notable change in period. An investigation of overs 
lapping periods has been made by Trachtenberg ? in the study of sta- 
tistics of epidemics. 

The practice of breaking up the total time interval into segments, and 
analyzing each separately, should be followed where possible. For this 
purpose, each part of the series must be long. It is only in this way, 
however, that we can hope to reveal those changes in amplitude and 
phase, and even in period, which seem to be characteristic of many his- 
torical series. 


1 Phil. Trans. Roy. Soc., A, vol. 206 (1906), p. 69. 
2 Jour. Roy. Stat. Soc., vol. 84 (1921), p. 578. 
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In seeking the possibility of spurious results due merely to chance, 
Schuster! studied the probability of accidental appearance of a given 
amplitude. He has developed a formula which gives a minimum limit 
to the length of a series of observations if we are to rely upon the de- 
termination of amplitudes which are as small as a given fraction of the 
mean value. This study must serve as a warning against the careless 
use of the delicate instrument furnished by the periodogram. 

Maxima and minima by inspection. In general, we may say that it 
is unwise to lose sight of the original statistics. If these be plotted, 
against the time as abscissas, it may often be possible to pick out by 
inspection the various maxima and minima. It is suggested ? that the 
corresponding period may be computed by the use of 


= Seon {(n— 1) (tn—b1) + (n—3) (tr—1— fe) + (n— 5) (En_2— ts) ++ +f, 
a graduation formula which will be found derived in Tuttle? This 
method of finding 7 must be used with caution; for, if there are 
several component periodic terms in the series, it is quite likely that 
we shall be unable to pick out the maxima and minima by inspection 
of the graph. 

It seems, therefore, that the periodogram offers the best present 
method for studying periodicities ; but it must be applied only to series 
which are sufficiently long, and its use should be accompanied by a 
careful supplementary study of the segments of the series. 


1 Terrestrial Magnetism (1898), p. 18. 
2 By Professor Allyn A. Young. 
$L. Tuttle, The Theory of Measurements (Phila., 1916), p. 246. 
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TABLES AND PERIODOGRAM CHART 


Taste I 


t¢ O p/12 2p/12 3p/12 4p/12 5p/12 6p/12 7p/12 Sp/12 Yp/12 10p/12 117/12 
y 35.9 38.6 36.6 29.9 20.7 13.1 7.2 3.3 6.3 12.1 21.7 30.5 


(if p is not 2 7, we may make a change of variable so that subsequent calcula- 
tions can be carried through on the assumption that 7p is 2 x.) 


TaBLe II 
n=12 
a 3607/n cos 360i/n sin 360i/n yy; yi cos 360i/n = y; sin 360i/n 
0 0 1. 0.0 35.9 35.9 0. 
1 30 .87 5 38.6 31.3 19.3 
2 60 5 . 87 36.6 18.3 31.8 
3 90 0.0 A 29.9 0. 29.9 
4 120 - 6 .87 20.7 — 10.4 18.0 
5 150 — 87 5 13.1 — 11.4 6.6 
6 180 —1, 0.0 7.2 — 7.2 0. 
7 210 — 87 — 5 3.3 — 2.9 See OY f 
8 240 — 5 — 87 6.3 — 3.2 — 5.5 
9 270 0.0 -—1. 12.1 0. — 12.1 
10 300 5 — .87 21.7 10.9 — 18.9 
ip 330 87 -— 5 305 26.5, — 15.3 
Totals 255.9 87.8 52.1 


From which: a) = 21.325, a; = 14.63, b: = 8.68, by use of (5). 


Taste III 
35.9 38.6 36.6 29.9 20.7 13.1 7.2 y:, t from 0 to 6 
80.5 21.7 12.1 6.3 3.3 t from 11 to 7 

v; Oto6 35.9 69.1 58.3 42.0 27.0 16.4 7.2 Sums (S) 
w:,1tod 8.1149 17.8 144 9.8 Differences (D) 

35.9 69.1 58.3 42.0 v;, 0 to 3; 8.1 14.9 17.8 w,,1to3 

7.2 16.4 27.0 6 to 4 98 14.4 5 to4 

pi, Oto3 43.1 85.5 85.3 42.0 r;,1to3 17.9 29.3 17.8 
qi, Oto2 28.7 52.7 31.3 s, and s; — 1.7 5 

43.1 85.5 po and pi 17.9 28.7 ry and qo 

85.3 42.0 Pp: and ps 17.8 31.2 rs and q2 
Ly and Ly; 128.4 127.5 t; and ty 1 — 2.6 


Which give: ao = 21.325, a1 = 14.98, b; = 8.51 and, by (3), amplitude is 17.3, 
phase angle is 60° 20’. 
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Taste IV. Interest Rates on 60-90 Day CommerciaAL Parrr! 


Deviations from 5 per cent; in units of 25 per cent 


May | June | Juzy | Ava. | Serr. | Ocr. | Nov. | Drc. 


— 13 15) |e 


4 5 6 5) —3 
9) 1 — 5) | 7) a ee 
00 6 9 7 10 
10 8 7 6 7 


1W. L. Crum, Rev. Econ. Stat., January, 1923. 
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CHAPTER XII 


INDEX NUMBERS 
By ALLYN A. YOUNG 


INDEX numbers are series of numbers which measure and express the 
relative changes, as from time to time or from place to place, in the mag- 
nitude of statistical groups or aggregates of variables. Sometimes a 
series of numbers proportional to some other simple statistical series of 
numbers, not of groups, is called a series of index numbers. It is better, 
however, to refer to these simpler proportional series as series of relative 
numbers or merely as series of relatives. 

In constructing a series of relatives some one term of the statistical 
series is selected as the base, its relative made 100 or unity, and the other 
relatives expressed as per cents or hundredths. Relative numbers have 
a large and growing use as an expository device, especially in economic 
statistics. 

Except for the choice of the base, no special problems attach to the 
construction of series of relatives. Index numbers of group changes, 
however, present serious difficulties. Such index numbers are of three 
general types, corresponding to three different ways of expressing the 
magnitude of group change: (1) The changes undergone by each sep- 
arate variable in the group may be expressed by a series of relatives, and 
the averages of such relatives taken to express the changes of the group. 
(2) The group may be represented by an appropriate average and a 
series of such averages expressed by relatives. (3) The individual va- 
riables in the group may be combined so as to form an aggregate, and a 
series of such aggregates expressed by relatives. Index numbers of the 
first type are averages of relatives; of the second type, ratios of aver- 
ages; of the third type, ratios of aggregates. 

4 


AVERAGES OF RELATIVES 


The arithmetic average. Index numbers have found their most im- 
portant use in measuring price changes. The types most commonly 
used in the past for that purpose have been series of arithmetic averages, 


weighted or unweighted, of relatives which express the periodic changes 
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of the different prices in a given group. It has been found, however, 
that unless they are weighted so as virtually to make them no longer 
averages of relatives, such index numbers are generally untrustworthy, 
and in particular that they are subject to biased errors. 

If pi, p'1, p’’1, ete., are the prices of n different commodities in a year 
selected as a base, and pn, pe, p’’2, etc., their prices in a second year, the 
unweighted arithmetic average of relatives for the second year, taking 
the base as unity, is 


D1 1 1 : = (1) 


Let pe = pi + Aps, p's = p’1 + Ap's, etc., where Api, Ap’, etc., may 
be positive or negative. Then the expression for the unweighted arith- 
metic average of relatives becomes 


ee ee ae 
1 . 


1 + 1 Ms 
n 

The fundamental defect of the unweighted arithmetic average of 
relatives is that the weight given to the amount of change of the price 
of any commodity is inversely proportioned to the magnitude of the 
price of that commodity in the base year. As soon as prices change, 
dividing them by prices in the base year ceases to bring them to an 
exactly comparable basis — unless, indeed, all prices happen to change 
at the same rate. The greater the differences of the rates of change; 
that is, the greater their dispersion, the less accurately do the series of 
relatives represent them. 

Biased error. Ina series of increasing prices, 7, if it be the first price 
in the series, will be relatively small. In a series of decreasing prices 
it will be relatively large. Adding the relatives computed on such a 
base, therefore, gives too much weight to increasing and too little to 
decreasing prices. Suppose, for example, that we are concerned with 
two commodities only. In the base year the price of one is $1, of the 
other, $2. In another year the first price becomes $2, while the second 
falls to $1. Giving equal weights to the two commodities, there is, by 
any reasonable test, no net movement of prices. Yet the price relatives 
become 200 and 50, and their arithmetic mean, 125, indicates an average 
increase in prices of 25 per cent. If we take the same figures and use 
the later year as a base, an average fall of prices from 125 to 100 — that 
is, of 20 per cent — isindicated. It should also be noted that the arith- 
metic average of relatives gives relatively more weight to prices that 
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rise rapidly than to prices that rise slowly, and relatively more weight 
to prices that fall slowly than to prices that fall rapidly. This average, 
therefore, exaggerates a general rise and understates a general fall in 
prices. A purely random set of changes would be reported by the arith- 
metic average of relatives as an average increase. 

Types of bases. In computing index numbers the base is sometimes 
broadened to include a number of years, or other periods. It may even 
be coextensive with the whole series of index numbers. Thus, for a 
series of relatives covering m years the base might be the arithmetic 


average, Pe era' The index number for any year k would 
then be 


p'k p's ere 
n\s@ +30) + BQ" 7 ) oe 

As compared with the single base taken at the beginning of the series 
a broadened base of the inclusive type, if the averages of both of the 
successive prices and of the relatives are arithmetic, will give a bias in 
the same direction if prices in general are decreasing, but will give an 
opposite bias if they are increasing. In either case it lessens somewhat 
the amount of biased error. Using as a base an average of prices ina 
relatively small group of successive years at the beginning of a rela- 
tively long series has a negligible effect in lessening bias. Its principal 
advantage is that it diminishes the chance that the base figures for one or 
more series of relatives will be distinctly abnormal or unrepresentative. 

A moving base is sometimes utilized. In such case chain or link rela- 


tives, such as a x oP etc., take the place of fixed base relatives, such 
1 2 3 


=) +0) =@) 
as 22, Ps P4 etc. A series such as —, =, <_, etc., is called a 
Pi Pi Pr n n n 
series of chain index numbers. 
The numbers in such a series may be referred to a fixed base by 
successive multiplications, so that the chain-derived fixed-base index 
number for the mth year from the base becomes: 


No particular advantages have commonly been claimed for the chain- 
derived index number as a means of making comparisons with a fixed 
base. It is sometimes supposed, however, that chain index numbers 
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a) 


derived from link relatives give more accurate direct year-with-year 
comparisons than do chain index numbers derived from arithmetic aver- 
ages of fixed-base relatives by division, as, for example, for the mth year : 


py ae Sees a But the reverse is true. 
1 

In general fe eee comparisons derived from arithmetic aver- 
ages of fixed-base relatives are more trustworthy than the direct com- 
parisons with the base which such averages give. The reasons are as 
follows: (1) Let M be the arithmetic average of such relatives. Di- 
viding M,, by M,,-: eliminates all systematic or biased error, except for 
the excess of the error of one average over the error of the other. Even 
if the amount of systematic error increases proportionately or more 
than proportionately to the distance from the base, such excess or dif- 
ference will in general be relatively small as compared with the error of 
either M,, or Mn_1. (2) In fact, however, the rate of increase of the sys- 
tematic error of the average diminishes as its distance from the base 
increases. Centripetal forces in the price system tend to keep the 
trends of different prices in alignment. In any year prices that have 
already increased at more than the average rate are more likely to fall 
and less likely to rise than prices which have increased less than the 
average. Just so far as these tendencies operate, they lessen the dis- 
parity between the true rates at which prices change in the given year 
and the changes of the fixed-base relatives. Link relatives, however, 
start afresh each year, so that the disparities which create biased error 
are at their worst.! 

The difference between the increase of prices between the year k and 
the year k + 1 as reported by the arithmetic average of link relatives 
and by the quotient of the arithmetic averages of fixed-base relatives for 


20s) 


the years k and k + 1is equal to ==“, when (1 + r;) is a link relative 


and d, is the relative or percentage digense between a, fixed-base rela- 


tive, ms , and the arithmetic average of such relatives in the year k.? 
1 


The only substantial advantage in the use of link relatives is that 
they permit of ‘splicing’; that is, of changing some of the series of 


1F. R. Macauley, in The American Economic Review, vol. 6 (1916), p. 208, noted 
that “chain numbers draw away (upwards) from the fixed-base numbers,” and rightly 
attributed it to ‘‘a greater tendency to rise and a less tendency to fall (in percentages) 
with the smaller relatives than with the larger relatives.” 

* This is, in slightly modified form, a result reached by W. F. Ogburn, in Bulletin 
of the U.S. Bureau of Labor Statistics, No. 284 (1921), p. 88. 


SS 
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price quotations used or even of increasing their number in accordance 
with changes in the available data. But certain other types of index 
numbers share this advantage, and some of them are preferable on other 
grounds to arithmetic averages of link relatives. 

If the arithmetic average of relatives is to be employed for the direct 
comparison of prices in two different years or places, its bias may be 
eliminated by using the mean proportionals between prices at the two 
periods or places as bases. This gives the formula: 


2( LEN) | ot fiat 
B-) és jG =) {UY Nees Pi 
wad I UI LIE CPDL, AV PBL, or 3/2 : 22. (4) 


This is probably, in principle, the best form of the unweighted arith- 
metic average of relatives. It even has some points of superiority over 
the geometric average, with which in general it will agree very closely. 
If only two series of prices are used, it will be identical with the geo- 
metric mean. 

The harmonic average. The unweighted harmonic average of rela- 
tives for the year k is: 


eS ore (5) 
—+—+-——4-.. 2(P 

Pe Pe P_k Pk 

Di p'1 p's 


It has sometimes been urged that the harmonic average is the proper 
measure of a change of the purchasing power of money, since purchasing 
power is the reciprocal of price, and a change in price is accompanied 
by an inverse change of purchasing power. The point is not sound. A 
change in price 7s an inverse change in purchasing power. Ina properly 
constructed index number it should be a matter of indifference whether 
price changes or their reciprocals are the component units. But the 
harmonic average does not agree with the arithmetic average. It 
uniformly gives a smaller result. A glance at its formula will show that 
the harmonic average of relatives is the reciprocal of the arithmetic aver=- 
age with the base shifted to the other of the two years involved in the 
comparison. It has the same type of biased error as the arithmetic 
average, but its error is in the opposite direction. In general, therefore 
what has been said of the arithmetic average of relatives holds, mutatis 
mutandis, for the harmonic average. For example, just as the arith- 
metic average exaggerates a general rise of prices, so the harmonic 
average understates it. 
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Compromise index numbers. Because the biased errors of the arith- 
metic and harmonic averages of relatives are opposite in direction and 
similar in type, it is obvious that they may be eliminated, in large 
measure at least, by blending the two averages. In principle it is better 
to use their geometric rather than their arithmetic mean, on the ground 
that the relative rather than the absolute amounts of the biased errors 
are more nearly compensating. In practice, however, thereis little differ- 
ence between the formulas 


42(2*) = 2(P) (6a) and 2h) ens (6b) 
D1 pr’ 2) on 5 Ps) 


Pk 


This method of “ rectification,” or logical compromise, is applicable 
to a wide range of index numbers, weighted and unweighted, that ex- 
hibit systematic biased error.1 Although unsatisfactory from the point 
of view of principle, and, in particular, complicated and difficult to com- 
pute, these compromise or “‘rectified”’ index numbers give, in general, 
satisfactory results. In practice it will be found that index numbers 
computed by formulas (4), (6 a), (66), and (7) will not differ notably.” 

The geometric average. The unweighted geometric average has cer- 
tain qualities that are highly desirable in index numbers. Some of 
these qualities are exhibited by the following elementary relations: 


LU ENE RS 
4)Pt x Bt x -++(n terms) ea et aa STE (7) 
Pine ps (P14 Pay... Vai X pixXses 
Pe Dk 
The geometric average of relatives is the reciprocal of the geometric 
average of relatives with the base reversed. That is, it is independent 
cf the base. Index numbers computed on one base can be shifted to 
another base by simple division. Splicing is therefore feasible. Chain 
and fixed-base methods give identical results. Moreover there is no 
difference between the unweighted geometric averages of relatives and 
the ratios between geometric averages of actual prices. In brief, geomet- 
ric averages give index numbers that are self-consistent. Furthermore, 
the geometric average is, on logical grounds, an appropriate average 


| The possibilities of this method have been systematically explored by Irving 
Fisher, in The Making of Index Numbers (1922). The method is not applicable when 
the two series are both either arithmetic or harmonic averages. 

* The relation between the geometric mean of the arithmetic and harmonic aver- 
ages and the geometric average (formulas 6 a and 7) has been investigated by C. M. 
Walsh, in The Measurement of General Exchange-Value (1901), pp. 516-19. 
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of relatives. It has nosystematicbias. It isa true average (although 
not the only possible type of true average) of rates of change. 

The median. Simplicity and definiteness of meaning are the princi- 
pal advantages of the median. It has no biased error. But it is un- 
reliable and erratic unless the number of prices (or other items) in the 
group is fairly large. It is insensitive to forces that show themselves 
in price changes only on one side or the other of the median. Never- 
theless medians of price relatives generally give much more reliable un- 
weighted index numbers than do arithmetic averages, other than the 
special type (4). 


RATIOS OF AVERAGES AND OF AGGREGATES 


Ratios of averages. Arithmetic averages of actual unit prices! do 
not have the biased errors that inhere in arithmetic averages of relatives. 
But they suffer from the fact that unit prices of different goods differ 
greatly in magnitude, varying not only with the value of the commod- 
ity, but also with the physical unit (for example, ton, ounce, yard) the 
price of which is quoted. The “ unweighted ”’ average is in reality 
arbitrarily weighted.2, The harmonic average, utilizing the reciprocals 
of unit prices, reverses the weighting. In the one case single units of 
goods and their money prices are the components of the average; in 
the other case units of money (dollars) and “ dollar’s worths ” of goods. 
The difficulty may be removed, however, and the arithmetic and har- 
monic averages brought into agreement by weighting the arithmetic 
average proportionately to physical quantities of goods (q) and the 
harmonic by quantities of money, or values (gp), for 


Z(qp) ___ =p) _. (8) 


#0") 


A series of index numbers such as =(qip1) Z(qepi) etc., however, has 
Z(q1) 2 (qx) 

the defect that if the weights of commodities whose unit prices happen 

to be large increase faster than the other weights, the index number will 

be too large, and vice versa. This source of error may be removed by 

converting the weights into “ dollar’s worths.”” When the conversion 


1 For a more detailed analysis see A. A. Young, “The Measurement of Changes 
of the General Price Level,” Quarterly Journal of Economics, vol. 35 (1921), pp. 
557-73. 

2 Reducing physical units to pounds (as in Bradstreet’s index numbers) or to some 
other common measure merely introduces another sort of arbitrary weighting. 
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is made on the basis of prices in the first year, the index number for the 
other year becomes the ratio of aggregates, 


z(q kPk) (9) 
Z(qeP1) 

Converting on the basis of prices in the second year gives for that year 
2=(qp k) 2 (10) 
=(q1 P1) 


Ratios of aggregates. Ratios of aggregates have long been recognized 
as appropriate expressions of the changes of “‘ composite prices,’’ such 
as the cost of living. There has been growing recognition of their merits 
as measures of general price movements. Analytically, as we have seen, 
they are akin to ratios of averages, and for that reason the two types are 
here considered together. , 

The two fundamental forms are (9) and (10) above. There is no 
general ground for preferring one to the other. A compromise is there- 
fore logically indicated, such as 


D(qupe) x, Z(qip) 
2(qx'p1) ‘s 2(qip1) is 
Professor Irving Fisher holds, on weighty grounds, that formula 
(11) is the “‘ ideal’ index number. He has shown that it gives results 
intermediate between those given by forms which, compared with each 
other, have opposed types of biased error, and that, in general, the more 
trustworthy an index-number formula is, the more closely its results 
approximate those given by formula (11).! é; 
A compromise between formulas (9) and (10) may also be effected by 
using geometric means of the weights rather than of the formulas 
themselves. This procedure gives: 


ZV qi 9k Pk. 
ZV 01qR 21 


As (9) and (10) are usually not far apart, substituting arithmetic for 
geometric means in (11) and (12) has a negligible effect upon their ac- 
curacy and makes them easier to compute. Formula (12) then becomes 


Z[2(q1 + ge)Pel 9 Zl(qi + ge) Pal 
2[3(qi + ge)Pil Z(G + gx) pil’ 


1The Making of Index Numbers, passim. 

2 This is identical with (4) — the best of the arithmetic averages of relatives — 
weighted by the geometric means of the money values of quantities of goods at the 
two periods; that is, by Vaig: X pi gue 


(12)? 


(13) 
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When accuracy, simplicity, and ease of computation are all taken into 
account, this appears to afford as good year-with-year comparisons as any 
other single index number of prices.!. Index numbers like (11) and (13) 
encounter the practical difficulty, however, that (except in the study of 
special fields, such as the prices of agricultural products) reliable figures 
for even approximately complete annual, to say nothing of monthly, 
weights are lacking. 

Another practical difficulty with compromise forms like (11) and (18) is 
that although they afford probably the most accurate year-with-year 
comparisons, they are not so well adapted to the constructions of series 
of successive index numbers.? Chain index numbers, in which the weights 
and prices used are those of contiguous years, and fixed-base index 
numbers, computed by these formulas, will not agree. The fixed-base 
method is to be preferred by reason of the way in which the chain 
method accumulates error. But under most conditions the simple 
aggregative with fixed weights (10) is to be preferred when a self-con- 
sistent series, rather than year-by-year comparisons, or comparisons 
of successive years with the basing year, is the desideratum. Such 
numbers, however, should be checked from time to time by (9), (11), 
or (13) and, if necessary, their weights revised. Moreover, there is 
little difference between the accuracy of aggregative and geometric 
types of index numbers. Peculiarities of the available data may in- 
dicate that, in some particular use, a weighted geometric average should 
be preferred. | 

When weights are not available ratios of aggregates are not trust- 
worthy, and recourse must be had to other methods. Although there is 
no large difference in the results given by (4), (6), and (7), its other ad- 
vantages, and especially the fact that it is self-consistent (that is, inde- 
pendent of the base), probably entitle the geometric average (7) to 
preference, even though (4) may be, in general, slightly more trust- 
worthy for year-with-year comparisons. 


1 Professor Fisher ranks (13) practically as high as (11). It has also the weighty 
approval of Mr. C. M. Walsh and Professors Alfred Marshall and F. Y. Edgeworth. 
In practice it seems to agree more closely with (11) than does (12) and it is probably 
somewhat more accurate than (12). Cf. Fisher, op. cit., pp. 401-07. 

2 Cf. the findings of Professor W. M. Persons, Review of Economic Statistics, Prel. vol. 
2, pp. 112, 113 (May, 1921). The difficulty is not with the particular formulas, 
which are probably the best of their kind. No aggregative index number with chang- 
ing weights can meet the so-called “‘circular test”; that is, the test of self-consistency. 
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METHODS OF WEIGHTING 


The effect of weighting. It has sometimes been held that the weight- 
ing of averages of relative prices may be dispensed with, provided that 
the number of series of price quotations used is relatively large. This 
contention would be well founded if there were no correlation between 
the importance of a commodity and its tendency to rise or fall in price, 
and further, if price variations always fell within a fairly narrow range 
in respect to magnitude. It is the failure of the second rather than the 
first of these conditions that makes weighting desirable. The un- 
weighted index number is too sensitive to abnormal variations of rela- 
tively unimportant prices and is too little influenced by large variations 
of important prices. 

Professor Mitchell infers, from a study of standard index numbers, 
that, except in abnormal years, weighting seldom makes a difference 
of 10 per cent. But this, as he suggests, is a much larger margin of 
error than is allowable in a good index number. Furthermore, the 
problem has significance only for averages of relatives. Ratios of 
aggregates are of necessity weighted. 

But weighting need not be precise. Round sums or even rough esti- 
mates will often serve the purpose about as well as precise figures. 
Professor Fisher has suggested that weighting to the nearest power of 
10, effected merely by moving the decimal point, is sufficiently accurate 
for most purposes. Nor is it always necessary to weight all of the 
series. Weighting should give their due importance, and no more, to 
large variations, whether these be abnormal variations of the prices of 
unimportant commodities, or smaller variations of the prices of impor- 
tant commodities. 

What weights are best? Weights must be selected with reference, 
first, to the type of formula used and, second, to the purpose of the par- 
ticular index number. Thus in aggregative index numbers of prices 
the weights are necessarily physical units. Aggregative index numbers 
of the cost of living should likewise be weighted by physical quantities. 
In practice the component units of such index numbers are often ab 
initio ‘sums expended ’’; that is, prices weighted by quantities. 

The proper weighting of the geometric average is a more difficult 


1 Bulletin of the U.S. Bureau of Labor Statistics, No.'284, p. 60. His comparisons 
are not wholly satisfactory, however, for the system of weighting employed is such 
as to change the type of the index numbers he studies. But other evidence, includ- 
ing his own comparisons of Dun’s, Bradstreet’s, and the Bureau of Labor’s index 
numbers, supports his conclusion. 
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problem. In practice, unless there is a marked positive or negative 
correlation of the p’s and q’s, it is generally best to take values (such as 
pq) as weights, so that the weighted geometric, with constant weights, 


becomes 
=pe Dk Pq D's pial tl 
V =) xX a) x (n terms), (14) 


The same general principles of weighting hold in dealing with special 
types of prices, such as wages and interest rates. In index numbers of 
wages, however, constant weights give untrustworthy results, for in- 
creasing wages are likely to be correlated with increasing numbers of 
wage-earners.! 

In constructing index numbers of the physical volume of trade, or 
of production or consumption, where the data run in terms of hetero- 
geneous physical units, the choice of weights must be determined on 
different principles. When, as is often the case, there is no correlation 
between the importance of the different variables and their rates of change, 
and when the number of series is fairly large — say, twenty or more — 
an unweighted average of relatives, such as the median or the geometric 
(7), will generally give fairly reliable results. Otherwise different sorts of 
goods must be reduced to the only practicable common measure, money 
value — a procedure consistent with some uses of such index numbers, 
but not withall. Weighted index numbers may then be formed by any of 
the reliable aggregative or geometric formulas, such as (9), (10), (11), (138), 
(14), with the important difference that the p’s and q’s are interchanged.? 

Just what should weights represent? Quantities (or values) consumed? 
or produced? or exchanged? For index numbers of the cost of living 
and of retail prices in general weighting by amounts consumed is indi- 
cated. For certain types of studies of the influence of changes of the 
quantity of money upon the price level, weighting of wholesale (and 
of retail) prices by amounts exchanged is desirable. But index numbers 
of wholesale prices probably serve the broadest range of interests, 
including the interests of economists, of business men, and, in general, of 
citizens in their dual réles of producers and consumers, when they are 
weighted according to amounts produced. 

Whatever the basis of weighting it is often desirable that commod- 


1 For other special problems of index numbers of wages see A. L. Bowley, Elements 
of Statistics (fourth ed., 1920), chap. rx, and references there given. 

2In using the weighted geometric (14), “values” (such as pg) are probably 
the best practicable weights when the variables averaged are physical quantities. 
Cf. E. E. Day, “An Index of the Physical Volume of Production,” Review of Hco- 
nomic Statistics, Prel. vol. 2 (Sept., 1920), p. 255. 
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ities should be classified in sub-groups, selected according to industries 
represented, or according as they are raw materials or finished products, 
or imported or exportable goods or goods sold mainly in the domestic 
market, or by other significant criteria. It is often quite as important 
to compare the fluctuations and trends of different groups of prices as 
to be informed of the movement of the general price level. The separate 
weighted indexes (if of the aggregative type) may generally be combined 
into weighted general index numbers without introducing error." 
Biased weighting. When sums of money or values (such as pg) are 
used as weights, care must be taken to avoid bias. Prices are factors 
in the weights, so that values for the later of any two periods compared 
are commonly directly correlated with price changes. The correlation 
between price changes and values of the earlier of any two periods com- 
pared is inverse. In the special case of the arithmetic average of rela- 
tives, however, weighting by basing-year values largely avoids bias, for 


P2 
>(arr') _ 2(qp2) 
Z(qipi1) — Z(qupr)’ 

which is the aggregative formula (10).2 Similarly, weighting the har- 
monic average of relatives by stated-year values (such as q2p2) gives the 
other fundamental aggregative formula (9). But weighting the arith- 
metic average by stated-year values or weighting the harmonic average 
by basing-year values gives wholly unreliable results, for in each case the 
bias of the weighting reinforces the bias inherent in the unweighted form. 

The geometric average, because its weights are exponents, is particu- 
larly sensitive to biased weighting. Using either basing-year or stated- 
year values as constant weights will introduce a considerable element of 
error, especially when there has been a marked general upward or down- 
ward movement of prices. The error may be lessened by using means 
(preferably geometric) of basing-year and stated-year values, or, for 
series of index numbers, averages (preferably geometric) of values for 
the years covered. Similar precautions are desirable in constructing 
weighted index numbers of physical production or of the physical 
volume of trade, by reason of the presence of q as a double factor. 

In general, there is very little weight bias in ratios of aggregates, 
unless there is a marked degree of correlation between the p’s and q’s. 


' For certain difficulties that may be encountered in practice, however, see Mit- 
chell, op. cit., p. 67. 

* In this manner, it will be noted, ratios of aggregates are related to averages of 
relatives, as they are to ratios of averages. 
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In constructing index numbers of the cost of living, weighted in propor- 
tion to the relative importance of different types of consumable goods 
in family budgets, the use of a “ crossed ”’ formula,! such as (11) ora 
“ crossed-weighted ” formula, such as (13), is desirable, because of 
the inverse correlation commonly found between prices and quantities 
of goods consumed. For wholesale prices constant weights are less 
objectionable, because the correlation, positive or negative, of whole- 
sale prices and quantities of goods exchanged is generally small. With 
a reasonably large number of series of price quotations formula (10), 
the simple aggregative with constant weights, gives trustworthy results 
over a fairly long period of years.? 


THE ACCURACY OF INDEX NUMBERS 


Substantial errors have often been put into index numbers by im- 
proper methods of construction. The errors inherent in the best formu- 
las, however, are exceedingly small, as is shown by the close agreement 
of the results they give. 

Like other averages, index numbers gain in accuracy when the num- 
ber of constituent items is increased. The probable errors of sampling 
of unweighted averages of relatives may be computed by the ordinary 
rules. Professor Truman L. Kelley* suggests as a measure of the prob- 


able error of an index number of the aggregative type .6745¢ 4" = 


(r being the coefficient of correlation between the series of index numbers 
for two random halves into which the series of quotations is divided, 
and o being the mean of the standard deviations of the two sub-series). 


1 Professor Irving Fisher suggests that weight bias may be eliminated by using 
the mean of a given formula and of a formula with an equal but opposed weight bias. 
This opposed formula, or “factor antithesis,’ is found by interchanging the p’s and 
q’s in the given formula, and dividing the result into the ratio, 

Z (que), 

Z(q1 P1) 
Except in the case of ratios of aggregates, this method generally leads to cumber- 
some formulas. Results practically as good can be obtained by “‘crossing’”’ (that 
is, by taking the mean of) weights rather than formulas. 

2In 1922 the U.S. Bureau of Labor Statistics substituted weights based on data 
for 1919 for weights based on 1909 data in its index number of wholesale prices (327 
commodities, formula 10). With the 1909 weighting the increase of prices from 
1909 to 1919 was reported as 219 per cent. With the weights of 1919 the increase 
reported was larger, but by less then 3 per cent. Using formula (11) or (18) would 
have made a difference of less than 1.5 per cent. Cf. Fisher, op. cit., p. 369. 

3 Statistical Method (1923), p. 338. 
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~ Professor Irving Fisher ! has constructed index numbers of the prices 
of from 3 to 200 commodities equably apportioned, so far as possible, 
among the various distinctive classes. His results (percentage stand- 
ard deviations) are as follows: 100 commodities, deviation 1.78; 50, 
2.05; 25, 1.61; 12, 2.64; 6, 4.31; 3, 3.65.2 He infers that “ to re- 
duce the error by half we must multiply the number of commodities not 
by four [as by the ordinary square-root rule] but by thirty-five.” These 
results are largely accounted for, however, he suggests, by the fact that 
the least important commodities were discarded first. By extrapolating 
his results graphically he estimated the probable error of the index num- 
ber of the complete sample of 200 commodities to be about 1.5 per cent. 
Professor Kelley’s method gave 1.3 per cent. 

“ Seldom, however,” Professor Fisher concludes, “are index numbers 
of much value unless they consist of more than 20 commodities; and 
50 is a much better number. After 50, the improvement obtained from 
increasing the number of commodities is gradual and it is doubtful if 
the gain from increasing the number beyond 200 is ordinarily worth the 
extra trouble and expense.” ? 

It is exceedingly important, however, that the quotations or other 
statistics used be a representative sample. More depends upon what 
commodities are included than upon their number. In constructing 
and using index numbers it is well to observe the effects of including or 
excluding some of the more heavily weighted or more variable items. 


1 The Making of Index Numbers, p. 337. 

2 Professor Mitchell’s somewhat similar experiments with large and small index 
numbers (Bulletin of the U.S. Bureau of Labor Statistics, No. 284, pp. 34-41) lead 
him to put somewhat more emphasis on the difficulty of securing a representative 
sample in a small index number. The smaller the index number, it is clear, the 
greater is the care with which the commodities that enter into it must be selected. 

5 Professor W. M. Persons has found that much of the cyclical fluctuation, as dis- 
tinguished from the trend, of index numbers of wholesale prices can be accounted 
for by the movements of a small number of prices. For use in forecasting business 
conditions, he has constructed a serviceable index number of the prices of only ten 
commodities. Cf. Review of Economic Statistics, November, 1921. 
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In compiling a bibliography to accompany the Handbook no attempt 
has been made to present an exhaustive list of the writings, or even the 
more important writings, on the subject. The purpose has been rather 
to present, for the guidance of users of the Handbook, references to those 
treatises and memoirs which bear directly upon and amplify the methods 
developed in the text. 

It is undoubtedly desirable that every title in the bibliography be 
followed by a synopsis and critical remarks. The task of providing 
such notes in any scientific subject is a burdensome one, and it is 
peculiarly difficult in the science of statistics because of the recent ex- 
ceedingly rapid growth and consequent inadequate critical study of 
methodology. For the purposes of the Handbook, we have confined 
our summaries and critical remarks to certain treatises. 

A large number of current scientific journals contain some papers 
on statistical methods but the following journals may be especially men- 
tioned as devoted to the advancement of such methods: 


Biometrika; Cambridge, England, The University Press. 

Journal of the American Statistical Association; Concord, N. H., The 
Rumford Press. (Prior to June, 1922, the Journal was entitled the 
Quarterly Publications of the American Statistical Association.) 

Journal of the Royal Statistical Society, London, published by the 
Society. - 

Metron; Padua, Tipografia Industrie Grafiche Italiane Padova. 

Review of Economic Statistics; Cambridge, Harvard University Press. 

In the bibliography the names of many journals are abbreviated, but 
it is believed that the abbreviations require no explanation except pos- 
sibly as follows: 

Biom. = Biometrika. 

J. A. S. A. = Journal of the American Statistical Association. 

Q. P. A. S. A. = Quarterly Publications of the American Statistical 

Association. 


J. R. 8. 8. = Journal of the Royal Statistical Society. 
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and 1902. 

Anprrson, O. von. Nochmals tiber “The Elimination of Spurious Correlation due 
to Position in Time and Space.’”’ Biom., vol. 10 (1914). 

Barey, W. B. Modern Social Conditions. N.Y., Century, 1906. 

Baiey, W. B., and Cummines, J. Statistics. Boston, McClurg, 1917. 

Barnett, G. E. Index Numbers of the Total Cost of Living. Quar. Jour. Econ., 
vol. 35 (1921). 

Benint, R. Principii di statisticametodologica. Torino, Unione tipografico — Edi- 
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Bertition, J. Cours élémentaire de statistique. Paris, Soc. d’éd. Sci., 1895. 

Brverwae, Sir W.H. Wheat Prices and Rainfall in Western Europe. J.R.S.S., 
vol. 85 (1922). 
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Biom., vol. 4 (1905). 

BuakeMan, J., and Pearson, K. On the Probable Error of the Coefficient of Mean 
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statistischen Masszahlen). Leipzig, Teubner, 1906. 

A treatise devoted largely to the theory of the flow of populations. 

Buocx, Maurice. T'raité théorique et pratique de statistique. Paris, Guillaumin, 
1886. 

Booz, G. A Treatise on the Calculus of Finite Differences. 3d ed., London, Mac- 
millan, 1872. 

Borrxiewicz, L. von. Anwendung der Wahrscheinlichkeitsrechnung auf Statis- 
tik. Encyk. der math. Wiss., Bd. 1,H.6; and Encyc. der sct. math., tome 1, vol. 
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Das Gesetz der kleinen Zahlen. Leipzig, Teubner, 1898. 

Kritische Betrachtungen zur theoretischen Statistik. Jahrb. fiir Nat. Ok. u. 
Stat. (3), vol. 8 (1894). 

Bowtey, A. L. An Elementary Manual of Statistics. London, Macdonald Evans, 
1910. 

—— Elements of Statistics. London, P. 8. King, 4th ed., 1920. 


The work is presented in two parts, of which the first is in the main non-mathe- 
matical, and the second chiefly theoretical. The illustrations are drawn chiefly 
from economic subjects. 


—— The Measurement of Changes in the Cost of Living. J. R.S.S., vol. 82 (1919). 

—— The Measurement of Groups and Series. London, C. & E. Layton, 1903. 

Brinton, W.C. Graphic Methods for Presenting Facts. N.Y., Eng. Mag. Co., 1914. 

Brown, W., and Tuomson, G.H. The Essentials of Mental Measurement. Cam- 
bridge University Press, 1921. 


Bruns, H. Wahrscheinlichkeitsrechnung und Kollektivmasslehre. Leipzig, Teubner, 
1905. 


A treatise devoted mainly to the representation of an arbitrary frequency func- 
tion. 


Brunt, Davip. The Combination of Observations. Cambridge University Press, 
1917. 


A valuable introductory treatment of the method of least squares. The first 
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and chief part of the book is devoted to the theory and method of adjustment of 
observations, and is written in such a way as to render it useful both for the study 
of the foundations of the method and for the criticism of actual practice. It is not 
designed, however, as a laboratory manual of least squares. The later chapters 
give excellent summaries of the methods of curve-fitting, correlation, harmonic analy- 
sis, and periodogram analysis. 


Bucuanan, James. Osculatory Interpolation by Central Differences, with an Ap- 
plication to Life-Table Construction. Jour. of the Inst. of Actuaries, vol. 42 (1908). 

Byerty, W. E. Fourier’s Series. Boston, Ginn, 1893. 

Carss, G. A., and Saparer, G. A Course in Fourier’s Analysis and Periodogram 
Analysis. London, Bell, 1915. 

Carver, H. C. The Mathematical Representation of Frequency Distributions. 
Qa RA ns.Al, VOL, Le (1921); 

Cave, B. M. Cf. Soper, H. E. 

Cave, B. M., and Pearson, K. Numerical Illustrations of the Variate Difference 
Correlation Method. Bzom., vol. 10 (1915). 

Cuarurer, C. V. L. Contributions to the Mathematical Theory of Statistics. 
Arkiv. for matematik, astronomi och fysik, vols. 7, 8, 9. 

Researches into the Theory of Probability. Lund, Contributions from the As- 
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pincott, 1863. 
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Crorton, M.W. On the Proof of the Law of Errors of Observations. Phil. Trans., 
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Crum, W. L. Cycles of Interest Rates on Commercial Paper. Rev. Econ. Stat., vol. 
5 (1923). 

—— A Measure of Dispersion for Ordered Series. Q. P. A. S. A., vol. 17 (1921). 

— A Special Application of Partial Correlation, Q. P. A. S. A., vol. 17 (1921). 

—— The Determination of Secular Trend. J. A. S. A., vol. 18 (1922). 

—— The Use of the Median in Determining Seasonal Variation. J. A. S. A., vol. 
18 (1923). 

—— The Resemblance between the Ordinate of a Periodogram and the Correlation 
Coefficient. J. A. S. A., vol. 18 (1923). 

Coummines, J. Cf. Bailey, W. B. 

Czuper, E. Die statistichen Forschungs-methoden. Wien, L. W. Seidel & Sohn, 
1921. 

This book resembles, in its methods, Yule’s Introduction to the Theory of Statis- 

tics. 

Wahrscheinlichkeitsrechnung. Leipzig, I, 3d ed., 1914; II, 3d ed., 1921. 

Volume I, Wahrscheinlichkeitstheorie, Fehlerausgleichung, Kollektivmasslehre. 

Volume II, Mathematische Statistik, mathematische Grundlagen der Lebensver- 

sicherung. 
One of the leading treatises on statistical methods and error theory. 
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tions. 2d ed., N.Y., Wiley, 1904. 
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Expsrton, W.P. Frequency Curves and Correlation. London, C. & E. Layton, 1906. 
An exposition of the method of moments and the Pearson system of frequency 
curve-fitting. The final chapters treat of various methods of measuring corre- 
lation. Although the book is written with actuarial problems in view, it is readily 
understandable by the general student. 
Graduation and Analysis of a Sickness Table. Biom., vol. 2 (1902-03). 
—— Interpolation by Finite Differences. Biom., vol. 2 (1902-03). 
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ont book presents the work of Fechner on the problem of generalized frequency 
theory. 
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Fecuner, G.T. Uber den Ausgangswerth der kleinsten Abweichungssumme, u. s. w., 
Leipzig, Abh. der Kg. stchs. Gesell. der Wiss., vol. 18 (1878). 

Frnon, L. N. G. Cf. Pearson, K. 
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TABLES OF PROBABILITY FUNCTIONS 
1 
2a 


and its Second, Third, and Fourth Derivatives! 


Areas under the Curve y = (ft) = e—#/2, the Function $(f), 


$(t) $?)(t) $° (¢) $4 (t) 


1.19683 
1.19653 
1.19563 
1.19414 
1.19204 


1.18936 
1.18608 
1.18221 
E7775 
1.17271 


1.16708 
1.16088 
1.15410 
1.14676 
1.13885 


1.13038 
1.12137 
1.11180 
1.10170 
1.09106 


1.07990 
1.06823 
1.05604 
1.04335 
1.03018 


1.01651 
1.00238 
0.98778 
0.97273 
0.95723 


0.94130 
0.92495 
0.90819 
0.89103 
0.87348 


0.85555 
0.83726 
0.81862 
0.79963 
0.78032 


0.76070 
0.74077 
0.72056 
0.70007 
0.67932 


0.65832 
0.63709 
0.61564 
0.59398 
0.57213 


-000000 
-003989 
-007978 
.011966 
.015953 


.019938 
.023922 
-027903 
-031881 
-035856 


-039828 
-043795 
-047758 
-051717 
-055670 


.059618 
-063560 
.067495 
-071424 
.075345 


.079260 
.083166 
.087064 
.090954 
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.094835 


.098706 
-102568 
.106420 
.110261 
-114092 


117911 
-121720 
-125516 
-129300 
-133072 


136831 
-140576 
-144309 
.148027 
-151732 


-155422 
.159097 
.162757 
.166402 
.170031 


.173645 
.177242 
.180822 
-184386 
.187933 


Re ITRL A Fal Monk DP ocd Met fot Pe PL pet TP At 


1 The values of ¢(t) and of its derivatives are taken from Tables of Applied Mathe- 
matics, by James W. Glover, through the courtesy of the publisher, George Wahr. 
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Areas under the Curve y = $(t) = oe e~f/2, the Function $(t), 


20 


and its Second, Third, and Fourth Derivatives 


if $(t)dt $(f) $2)(t) $2)(t) $(9(2) 
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1 


2a 


and its Second, Third, and Fourth Derivatives 


Areas under the Curve y = $(f) = e~@/2, the Function $(f), 


a2) | gan $(2) 
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Areas under the Curve y = $(f) = ape 35 the Function $(¢), 


20 
and its Second, Third, and Fourth Derivatives 


rower $O(t) $(2) 


-70425 
.69933 
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Areas under the Curve y =¢4(t) = =A e~?/2, the Function $(#), 


2 
and its Second, Third, and Fourth Dzrivatives 


fe > t)dt (t) 2)(t) $3 (t) $1) 
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Areas under the Curve y = $(t) = “Ee e/2, the Function $(), 


27 


and its Second, Third, and Fourth Derivatives 


$(t) $(t) $®)(t) $4 (t) 
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1 


2a 


and its Second, Third, and Fourth Derivatives 


Areas under the Curve y = $(t) = e-f/2, the Function $(f), 


2)(t) $)(£) (tf) 


-498650 
. 498694 
-498736 
498777 
-498817 


-498856 
-498893 
-498930 
.498965 
-4938999 


.499032 
-499065 
-499096 
.499126 
.499155 


-499184 
-499211 
.499238 
499264 
-499289 


-499313 
-499336 
-499359 
-499381 
-499402 


-499423 
.499443 
-499462 
499481 
-499499 


499517 
-499534 
.499550 
-499566 
499581 


.499596 
.499610 
499624 
499638 
499650 


499663 
499675 
-499687 
.499698 
499709 


.499720 
499730 
499740 
499749 
499758 


Ja LT Bt SF SNR aT os Yaa) eS ee SUT aL SH eth] ha ee af CesT jw 


af 
2 
3 
4 
5 
6 
7 
8 
9 
0 
a4 
2 
3 
4 
5 
6 
7 
8 
9 
0 
al 
2 
3 
4 
5 
6 
7 
8 
9 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
0 
1 
2 
3 


ooo 


3 
=H 
3. 
a 
3. 
3S 
3. 
3. 
cf 
3. 
3. 
3. 
3. 
. 

13. 
3. 
ey 
ae 
3, 
3. 
3. 
3 
3 
“3 
3 
3 
3 
3 
3 
3. 
3, 
3. 
3. 
3. 
Bs 
3. 
3. 
@) 
a: 
Bi 
3. 
2) 
3 
3 
3. 
3 
3. 
3. 
3. 
3. 


00 
0 
0 
0 
0 
0 
0 
0 
0 
0 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
3 
3 
3 
3 
3 
3 
3 
3 
3 
3 
4 
4 
‘4 
4 
44 
45 
46 
47 
48 
49 


LPWOWHW WHWWWW WHWWHW WHWWHW WWHWWHW WHWWWH WHWWWW WWWWW WWWWW WWWWH 


BeaRS PPRPAR WWWWW WHwWWW HdYNHHH HNYHNHH HERE EEE GOOOO 


‘07074 


216 HANDBOOK OF MATHEMATICAL STATISTICS 


Areas under the Curve y = $(t) = ae e-°/2, the Function $(f), 
27 


and its Second, Third, and Fourth Derivatives 


freed | eo g2() | GO) (0) 
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A posteriori, probability, 16. 

A priori, probability, 15. 

Absolute error, 2. 

Absolute values, mean of, 27. 

Accuracy of index numbers, 193. 

Addition formula, in probability, 16. 

Adjusted monthly indexes, in seasonal va- 
Tiation, 152. 

Aggregates, ratios of, 188. 

AM, See arithmetic mean. 

Analysis, for a single known period, 167; for 
multiple periods, 169. 

Approximate expressions for the probability 
integral, 15. 

Arbitrary functions, the case for, 169. 

Argument, the, 35. 

Area of frequency rectangles, significance of, 
Dos 

Area under a curve, the, 17. 

Areas, moments of, 69. 

Arithmetic mean, 4, 181; graphical meaning 
of, 23; computation of, 24; weighted, 24; 
form for calculation, 24. 

Averages applied to frequency distributions, 
23; graphical meaning of the arithmetic 
mean, 23; computation of the arithmetic 
mean, 24; weighted arithmetic mean, 24; 
form for calculation of arithmetic mean, 
24; geometric mean, 25; harmonic mean, 
25; graphical meaning of the median, 26; 
computation of the median, 26; the mode, 
26; empirical mode, 27. 

Averages, appropriate, selection of, 32; 
definitions of various kinds, 4; in deter- 
mination of parameters, method, 64. 

Averages of relatives; arithmetic average, 
181; biased error, 182; types of bases, 183; 
the harmonic average, 185; compromise 
index numbers, 186; the geometric aver- 
age, 186; the median, 187; ratios of, 187. 

Averaging, graduation by, 58. 


Barlow’s Tables, 1. 

Bases, types of, 183; moving, 183; fixed, 184. 

Bauschinger and Peters, tables of loga- 
rithms, 1. 

Bernoulli distribution, 83. 

Bernoulli numbers, the, 10. 

Bernoulli’s theorem, in probability, 16; ap- 
plied to random sampling, 72; probable 
error in number of successes, 73; proba- 
bility of deviation, 74. 

Bessel’s interpolation formula, 38; deriva- 
tives by, 51. 

Beta Function, the, 11. 

Biased error, 182. 

Biased weighting, 192. 


Binomial coefficients, 8; table of, 9. 
Binomial distribution, 83. 
Binomial theorem, 7, 8. 

Bi-serial r, 136. 

Blakeman, J., 131. 

Bécher, 156. 

Bortkewitsch, the law of small numbers, 75. 
Bowley, A. L., 114, 191. 

Bracket rank method, the, 133. 
Bremiker, tables of logarithms, 1. 
Brunt, D., 166, 169, 173. 
Buchanan, James, 45. 

Burgess’s Table, 14. 

Burn and Brown, 49. 

Byerly, W. E., 166. 


Calculating machines, 1. 

Calculation of mean deviation, 29; of 
standard deviation, 28; of correlation, 
124; of correlation ratio, 130. 

Carse, G. A., and Shearer, G., 168, 171, 172. 

Carver, H. C., 113. 

Center of gravity, 4. 

Centroid, the, 4. 

Chain index numbers, 183. 

Chain relatives, in correlation of time, 152. 

“Change of scale,” 7. 

Charlier, C. V. L., 84; coefficient of dis- 
turbancy, 88; check, 29. 

Charlier theory, 114. 

Chauvenet, William, 38, 39, 40, 45. 

CHM. See contra-harmonic mean. 

Class frequency, 21. 

Class interval, 21. 

Class mark, 21. 

Coefficient of association, the, 135; of cor- 
relation, 121, 163; probable error, 162; of 
disturbancy, Charlier, 88; of variability, 
29. 

Coefficients, binomial, 8; generalized corre- 
lation, 139; of correlation, 163; of colliga- 
tion, 135. 

Colligation, coefficient of, 135. 

Combinations, 8. 

Commercial paper rate, periodogram analy- 
sis of, 174. 

Comparison of corrected series, 160. 

Compromise index numbers, 186. 

Computing machines, 1. 

Constants, frequency, 95. 

Constants of the normal curve, 11. 

Contingency, method of, 135; coefficient of 
mean square, 135. 

Continuous variables, 20, 

Contra-harmonic mean, 5. 

Corrected series, comparison and correlation 
of, 160. 
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Corrections for grouping and errors of ob- 
servation, 131. 

Correlation, simple, 120-38; meaning of, 
120; correlation coefficient, 121; numer- 
ical computation of r, 124; regression, 126; 
scatter diagram, 127; form for calculation 
of correlation ratios, 130; correlation from 
ranks, 132; ties in rank, 133; difficulties in 
interpretation of, 138; partial and multi- 
ple, 139; ratios, 146; of time series, 150-65; 
seasonal variation, 151; secular trend, 158; 
elimination of seasonal variation and secu- 
lar trend, 160; comparison and correlation 
of corrected series, 160. 

Correlation ratio, 129-31; non-linear re- 
gression, 129; form for calculation, 130; 
probable error of 7, 131; test of linearity of 
regression, 131; corrections for grouping 
and errors of observation, 131. 

Correlation table, 122. 

Crathorne, A. R., 120, 130. 

Crelle’s Table, 1. 

Criteria for the normal curve of error, 99; of 
Lexis and Charlier, the, 87-88; applica- 
tions to statistical data, 88-91. 

Criterion, Pearson’s, 78. 

Crofton, M. W., 97. 

Crum, W. L., 155. 

Cumulative frequency distribution, 23. 

Curve, the area under a, 17; constants of the 

normal, 11; equation of the normal, 12; 
the normal, 96. 

Curve-fitting, the problem of, 62; by the 
method of least squares and the method of 
moments, 62; selection of type of equa- 
tion, 63; determination of parameters, 64. 

Cyclical movements, 161. 

Czuber, E., 27, 84, 97. 


Day, E. E., 191. 

Derivatives in terms of differences, 50; by 
Newton’s formula, 50; by Stirling’s for- 
mula, 51; by Bessel’s formula, 51; by 
Everett’s formula, 52. 

Deviation, mean or average, 29. 

Diagonal difference table, 37. 

Difference equation graduation, 111. 

Difference table, fundamental relations in, 
35, 37. 

Differences, interpolation by, 34. 

Discrete variables, 20. 

Dispersion, or Variability, 27; measures 
of, 27; standard deviation, 27; graphical 
Meaning of ¢, 28; form for calculation of 
standard deviation, 28; coefficient of varia- 
bility, 29; mean or average deviation, 29; 
graphical meaning of mean deviation, 29; 
calculation of mean deviation, 29; quartile 
deviation, 31; formulas for probable errors, 
31. 

Dispersion of statistical ratios, 82; of relative 
frequencies, 82. See Types of dispersion. 
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Distribution; Bernoulli or binomial, 83; 
Poisson, 84; Lexis, 85. See Frequency 
distribution. 


Edgeworth, F. Y., 114, 189. 

Elderton, W. P., 79, 110, 164. 

Elimination of seasonal variation and ses- 
ular trend, 160. - 

Ellis, 20. 

Empirical mode, the, 27. 

Encke’s Table, 14. 

Equation of the normal curve, 12. 

Error, absolute, 2; relative, 2; biased, 182; in 
computation, propagation of, 3; probable 
meaning of, 76; in tables, 47; normal 
curve of, 99. 

Errors of observation, correction for, 131. 

Euler-Maclaurin formula, the, 18. 

Eta (7), mean absolute error, 12, correlation 
ratio, 131. 

Everett, J. D., formula, 39; derivatives by, 
52. 

Expectation, mathematical, 16. 


Factorial n, 9. 

Fifth difference osculatory interpolation, 43. 

Filon, L. N. G., 95. 

Finite integrals in terms of differences, 53. 

Fisher, Arne, 84, 90. 

Fisher, Irving, 186, 188, 189, 193, 194. 

Fisher, R. A., 78, 81. 

Form for the calculation of r, 124; for the 
calculation of correlation ratios, 130; for 
the calculation of standard deviation, 28. 

Formula for n factorial (Stirling’s), 9; for 
am (Wallis’s), 10. 

Formulas, graduation and smoothing, 58; 
interpolation, 36; osculatory interpolation, 
43; for probable errors in certain averages, 
31; in probability, fundamental, 15; a 
priori, 15; addition, 16; multiplication, 16; 
mathematical expectation, 16; Bernoulli’s 
theorem, 16; a posteriori, 16; for r, 125. 

Foster, R. M., 6. 

Fourier analysis, 166. 

Fourier’s series, 19. 

Frequency curves, 22, 92-119; modification 
of frequency moments, 92; probable 
errors of frequency moments, 95; the 
normal curve, 96; Pearson’s generalized 
fcequency curves, 103; difference equa- 
tion-gradation, 111; the generalized nor- 
mal curve (Charlier theory), 114. 

Frequency distribution, 20; class interval 
and class mark, 21; class frequency, 21; 
frequency polygon, 21, 22; frequency 
rectangles, 21, 22; cumulative, 23; sig- 
nificance of area of frequency rectangles, 
22; frequency curve, 22; cumulative fre- 
quency distribution, 23; Ogive, 23, 153. 

Frequency moments and constants, probable 
errors of, 95. 
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Frequency polygon and rectangles, 21, 22. 
Function, Gamma, 10; Beta, 11. 
Functions, probability, tables of, 209-16. 


Gamma function, 10. 

Gauss, interpolation formula, 37; normal 
error curve, 11. 

Generalized correlation ratios, 146-49; par- 
tial and multiple correlation ratios, 146; 
sample three-variable problem, 148. 

Generalized frequency curves, Pearson’s, 103. 

Generalized normal curve, Charlier theory 
of, 114. 

Geometric mean, 4, 25, 185. 

Glover, J. W., 41, 42, 44, 45, 56; Tables for 
Applied Mathematics, 2. 

GM. See Geometric mean. 

Goodness of fit, test of, 78; Pearson’s cri- 
terion, 78. 

Graduation, difference equation, 111. 

Graduation and smoothing formulas, 53; 
by averaging, 58; Woolhouse, Hiziia 
and Spencer, 59. 

Graphical meaning of ¢, 23; of moan davia- 
tion, 29. 

Graphical method, in determination of pa- 
tameters, 64. 

Grouping, corrections for, 131. 


Halves, subdivision into, 45. 

Harmonic analysis for known periods, 153; 
nature of periodicity, 166; analysis for a 
single known period, 167; simplified cul- 
culation, 168; analysis for multipole 
periods, 169; the case for arbitrary faac- 
tions, 169. 

Harmonic average, 185; mean, 5, 25. 

iarris;.J. A,,, 122. 

Hart, W. L., 155. 

Henderson, Robert, 48, 61. 

Henry, Alfred, 42, 57. 

Higham graduation formula, 59. 

Histogram, 22. 

HM. See Harmonic mean. 

Horizontal difference table, 35. 

Huntington, E. V., Handbook of Mathe- 
matics for Engineers, 1; 129, 125. 

Hypergeometric series, the, 11. 


Index numbers, 181-94; averages of rela- 
tives, 181; types of bases, 183; moving 
base, 183; chain, 183; compromise, 183; 
ratios of averages and of aggregates, 187; 
“ideal,’’ 188; methods of weighting, 190; 
the accuracy of index numbers, 193. 

Indexes of seasonal variation, computation 
of, 154; comparison of, 157. 

Infinite Series, 18; Taylor’s theorem, 18; 
Mazclaurin’s theorem, 19; Fourier’s series, 
19. 

Integrals, finite, 53. 

Interest rates, 178. 
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Interpolation, underlying ideas of, 34; 
horizontal difference table, 35; argument, 
35; leading differences, 35; fundamental 
relations in difference table, 35; formulas, 
36; Newton’s, 36; diagonal difference 
table, 37; formulas of Gauss and Stirling, 
37; Bessel’s, 38; Everett’s, 39; choice of, 
40; osculatory, 438. See Systematic Inter- 
polation. 

Intervals, subdivision of, 41. 


Jackson, D., 6. 
Joffe, A. S., 48. 
Jones, D. Caradog, 110. 


Karup, Johannes, 45. 

Kelley, T. L., 131, 142, 143, 193. 
Keynes, J. M., 84. 

King, George, 41, 45, 57, 61. 


Lag of one series, with respect to another, in 
correlation of time, 161. 

Lagrange’s formula, 47. 

Landre, Corneille L., 59. 

Law of small numbers, Bortkewitsch, 75. 

Leading differences, the, 35. 

Least squares, 4; method in determining 
parameters, 64, 65-68. 

Lexis W., 82; distribution, 85; ratio, 87. 

Lexis and Charlier, criteria of, 88. 

Lidstone, George J., 45, 61. 

Linearity of regression, test of, 131. 

Link relatives, 151, 184. 

Lipka, J., 63. 

Logarithms of numbers, tables of, 1. 

Loomis, Elias, 39. 

Lubbock, J. W., 54; formula, 54, 


Macauley, F. R., 184. 

Machines, computing, 1. 

Maclaurin’s theorem, 19. 

Marshall, Alfred, 189. d 

Mathematical expectation, 16. 

Maxima and minima, by inspection, 176. 

Mean, moments about the, 70. 

Mean, arithmetic, 4, 22; geometric, 4, 25; 
harmonic, 5, 25; contra-harmonie, 5; root- 
mean-square, 5; median, 6; mode, 7; 
weighted, 7. 

Mean absolute error, 13. 

Mean deviation, 29. 

Mean square error of estimate, 128. 

Meaning, notation and formulas (general- 
ized correlation coefficients), 139. 

Means, weighted, 7. 

Measure of precision, 12; of dispersion, 27. 

Median, graphical meaning of, 26; cortupu- 
tation of, 26; in index numbers, 187; in 
seasonal variation, 152; of a set of quan- 
tities, 6. 

Medico-Actuarial Investigation, 21. 

Memoranda, mathematical, 1. 
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Merriman, Mansfield, 96. 

Method of contingency, 
squares, 4. 

Mid-rank method, the, 133. 

Mitchell, W. C., 190, 194. 

Mode, of a set of quantities, 7, 26; empizical, 
Die 

Modification of frequency moments, 92. 

Modulus of normal curve, 12. 

Moments, method of, 65; determination of 
parameters by method of, 68. 

Moments of the area of the normal curve, 15; 
method in determining parameters, 65, 
68-70. 

Moore, H. L., 164. 

“Most probable value,” 82. 

Movements, cyclical, 161. 

Multiplication formula, in probability, 16. 

Multiplication tables, 1. 


185; of least 


Newton’s interpolation formula, 36; deriva- 
tives by, 50. 

Non-linear regression, 129. 

Normal curve, the, 96; moments of the area 
of, 15; generalized, 114. 

Normal dispersion, 87. 

Normal distribution, 29. 

Normal equations, 66. 

Normal error curve, Gauss’s, 11. 

Normal probability curve, 12. 

Normal value, 160. 

“Notation by powers of 10,” 4. 

Numerical computation: slide rules, tables 
and computing machines, 1; absolute and 
relative errors, 2; propagation of error in, 
3; rejection of superfluous figures, 3; 
definitions of various kinds of means or 
averages, 4; permutations and combina- 
tions, 7; the binomial theorem, 8; Stir- 
ling’s formula, 9; the Bernoulli numbers, 
10; the Gamma Function, 10; Gauss’s 
normal error curve, or probability curve, 
11; fundamental formulas in probability, 
15; quadrature formulas, for numerical 
integration, 17; infinite series, 18; of the 
correlation coefficient, 122. 


Observational equations, 66. 

Ogburn, W. F., 184. 

Ogive, 23. 

Oppolzer’s Table, 14. 

Ordinates, not equidistant, 47; moments of, 
68. 

“Oscillation’’ method, the, 173. 

Osculatory interpolation formulas, 43. 


Pairman, Eleanor, 94. 

Parameters, determination by method of 
least squares, 65-68; by method of 
moments, 68-70. 

Partial and multiple correlation, 139-49; 
mIneaning, notation and formulas, 139; 
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sample three-variable problem, 143, 148; 
generalized correlation ratios, 146-49. 

Pearson’s generalized frequency curves, 103. 

Pearson, Karl, 2, 27, 62, 69, 81, 94, 95, 104, 
110, 121, 129, 131, 132, 135, 136, 137, 141, 
147, 148, 164; curves, 65; criterion, 78. 

Period, determination of, 173. 

Periodicity, nature of, 166. 

Periodogram, the, 170; introductory dis- 
cussion, 170; changes effected by sum- 
ming, 172; determination of a period, 173; 
the “oscillation” method, 173. 

Periodogram Analysis, 166-80; Harmonic 
analysis for known periods, 166; the 
periodogram, 170; applications, 174; 
tables and figures, 177-80; critical re- 
marks on method, 175. 

Permutations and combinations, 7. 

Persons, W. M., 151, 164, 189, 194. 

Peter’s table, 1. 

Pierce, B. O., 10. 

Pierpont, J., 169. 

Poisson distribution, 84; exponential limit, 
application of, 75. 

Polygon, frequency, 21, 22. 

Probable error, 31; in number of successes in 
random sampling, 73; formulas for, 31, 77; 
in an average, 76; of relative frequency, 
77; of AM, 77; of median, 77; of standard 
deviation, 77; of quartile, 77; of semi- 
quartile range, 77; of coefficient of varia- 
tion, 77; of second, third and fourth mo- 
ments, 77; of coefficient of correlation, 77; 
of regression coefficient, 77; of 8, and B,, 
1H 

Probable error of 7, 131; of r, 125; of fre- 
quency moments and constants, 95. 

Probability, fundamental formulas in, 15; 
a priori, 15; addition formula, 16; multi- 
plication formula, 16; mathematical ex- 
pectation, 16; Bernoulli’s theorem, 16; a 
posteriori, 16. 

Probability, integral, 14; tables described, 
14; approximate expressions for, 15; of 
deviation, 74. 

Probability curve, 11. 

Probability functions, tables of, 209-16. 

Product-moment coefficient, 121. 

Propagation of error, in computation, 3. 


Quadrature formulas, for numerical integra- 
tion, 17; trapezoidal rule, 17; Simpson’s 
rule, 17; the ‘‘three-eighths” rule, 17; 
Weddle’s rule, 18; Sheppard’s rule XI, 18; 
the Euler-Maclaurin formula, 18. 

Quartile deviation, 31. 


r, the correlation coefficient, 121; probable 
error of, 125; formulas for, 121; 125; form 
for the calculation of, 124; bi-serial, 136. 

Radius of gyration, 5. 

Random sampling, 71; Bernoulli theorem} 
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72; probable error in number of successes, 
73; probability of deviation, 74. 

Ranks, correlation from, 132; ties in, 133. 

Rational integral algebraic functions, 49. 

Ratios of averages, 187; of aggregates, 188; 
partial and multiple correlation, 146; 
statistical, types of dispersion of, 82. 

Rectangles, frequency, 21, 22. 

Regression, 126-28; lines of, 126; mean 
Square error of estimate, 128; standard 
deviation of arrays, 128. 

Rejection of superfluous figures, 3. 
Relationships between time series, sum- 
mary of methods of setting forth, 165. 

Relative error, 2. 

Relative frequencies, 82. 

Relative numbers, 181. 

Relatives, 181; averages of, 181. 

Rice, Herbert L., 39, 53. 

Rietz, H. L., 120, 128. 

RMS. See Root-mean-square. 

Roa, Emeterio, 116. 

Root-mean-square, 5; of deviations, 27. 

Runge, C., 168. 

Running, T. R., 62, 63, 168. 

Rutherford and Geiger, 21, 80. 


Scatter-diagrams, 120, 127. 

Schuster, 171, 173, 175; periodogram, 166. 

SD. See Standard deviation. 

Seasonal variation, 151; indexes of, 158. 

Secrist, H., 30. 

Secular trend, 158; elimination of, 160. 

Selected points, in determination of para- 
meters, method, 64. 

Semi-interquartile range, 31. 

Series, Fourier’s, 19. 

Sheppard, W. F., 48, 95; tables, 14; correc- 
tions, 29, 70, 94; rule XI, 18. 

Simple correlation, 120-38. 

Simpson’s rule, 17. 

Skewness, 33, 104. 

Slide rules, 1 

Slutsky, E., 81. 

Smoothing formulas. See Graduation. 

Solution of normal equations, example of, 67. 

Soper, H. E., 75, 137. 

Spearman, C., 133. 

Spearman’s foot-rule, 133. 

Spencer graduation formula, 59. 

Sprague, Thomas Bond, 45. 

Squares, Barlow’s, tables of, 1. 

Standard deviation, 5, 27; of arrays, 128; of a 
given frequency, form for calculation of, 28. 

Statistical ratios, in random sampling, 72; 
types of dispersion, 82. 

Statistics, material of, 20. 

Stirling’s formula, 9; 37; deviations by, 51. 

Stump of a distribution, graduation of, 113. 

Subdivision into intervals, 41. 

Subnormal dispersion, 87. 

Summation formulas, 53; finite integrals in 
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terms of differences, 53; Lubbock’s for- 
toula, 54; Woolhouse’s formula, 56. 

Summing, changes effected by, 172. 

Superfluous figures, rejection of, 3. 

Supernormal dispersion, 87. 

Systematic Interpolation by continuous 
methods, 41; subdivision by Newton's 
formula, 41; osculatory interpolation 
formulas, 43; fifth difference osculatory 
interpolation, 41; subdivision into halves, 
45; ordinates not equidistant, 47; ra- 
tional integral algebraic functions, 49. 


Tables, errors in, 47. 

Tables of probability functions, 209-16. 

Taylor's theorem, 18. 

Test, of goodness of fit (Pearson’s criterion)} 
78; of linearity of regression, 131. 

Tetrachoric correlation, 133. 

Theorem, Taylor’s, 18; Maclaurin’s, 19. 

Thompson, A. J., 41. 

“Three-eighths”’ rule, the, 17. 

Three-variable problem, sample, 143, 148. 

Ties in rank, 133. 

Time series, correlation of, 150-65; seasonal 
variation, 151; secular trend, 158; elimi- 
nation of seasonal variation and secular 
trend, 160; comparison and correlation of 
corrected series, 160. 

Trachtenberg, H. L., 175. 

Trapezoidal rule, the, 17. 

Trend, seasonal, 158; elimination of, 160. 

Tuttle, L., 176. 

Type of equation in curve-fitting, 63. 

Types of dispersion of statistical ratios, 82; 
telative frequencies, 82; Bernoulli or 
binomial distribution, 83; Poisson dis- 
tribution, 84; Lexis distribution, 85. 


Variability, 27; coefficient of, 29. See Dis- 
persion. 

Variables, 20. 

Variate difference method, 164. 


Venn, John, 20. 


Yule, G. Udny, 25, 30, 81, 126, 131, 188, 143; 
coefficients, 135. 


Wallis’s formula for 7, 10. 

Walsh, C. M., 186, 189. 

Weddle’s rule, 18. 

Weighted arithmetic mean, 24. 

Weighting, methods of, 190; the effect of, 
190; selection of weights, 190; biased 
weighting, 192. 

Whitaker, Lucy, 73. 

Woolhouse, W. S. B., 57, graduation for- 
mula, 56, 59. 5 


Young, Allyn A., 25, 166, 176, 187. 


Zimmerman Table, 1. 
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